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ABSTRACT 


Scheduling  analysis  is  one  of  the  most  important  activities  in  hard  real-time 
systems  development  since  the  conecmess  of  hard  real-time  systems  depends  not  only  on 
the  logical  results  of  computation,  but  also  on  the  time  at  which  the  results  are  produced. 
This  dissertation  aimed  at  the  development  of  both  fundamental  theoiy  and  software  tools 
to  support  efficiently  and  reliably  the  scheduling  of  distributed  hard  real-time  systems.  The 
major  work  of  this  dissertation  focuses  on  non-preeiiq)tive  hard  real-time  scheduling,  for 
periodic  and  sporadic  task  sets,  although  some  of  the  results  are  also  applicable  to  the 
preen:5)tive  case. 

Several  theorems  for  checking  the  schedulability  of  non-preemptive  task  sets  are 
developed.  Previous  results  on  necessary  and  sufficient  conditions  for  scheduling  non- 
preemptive  task  sets  are  extended  to  cover  the  case  when  the  task  deadlines  can  be  smaller 
or  equal  to  their  periods.  The  concept  of  transient  and  cyclic  schedules  is  introduced  to 
overcome  the  weakness  of  the  traditional  methods,  which  restrict  the  construction  of  a 
cyclic  schedule  to  a  fixed  interval  of  length  equal  to  the  least  common  multiple  of  the 
periods.  An  algorithm  for  reducing  the  schedule  length  of  periodic  task  sets  is  developed 
to  further  enhance  the  schedulability  of  the  hard  real-time  systems.  Preliminaiy  study  on 
randomly  graphs  shows  that  the  algorithm  do  produce  near-optimal  solution. 

To  ease  the  problem  of  synchronization  among  tasks  in  distributed  hard  real-time 
systems,  we  introduce  the  Fundamental  Synchronization  Theorem  and  a  novel  model  for 
designing  distributed  hard  real-time  systems  without  explicit  synchronization,  and  develop 
an  Ada95  software  architecture  to  support  such  a  model  The  application  of  this  theorem 
wfll  allow  us  to  treat  each  set  of  tasks  allocated  to  a  particular  processor,  as  a  totally 
independent  set,  if  the  tasks  satisfy  the  conditions  described  in  the  theorem.  This  approach 
will  greatly  decrease  the  difficulties  in  scheduling  large  distributed  real-time  systems. 

One  of  the  necessary  steps  in  distributed  hard  real-time  scheduling  is  the  allocation 
of  tasks  to  different  processors  in  the  distributed  system.  Algorithms  for  task  allocation 
which  minimize  the  inter-module  communication  costs  are  developed  and  implemented. 

Hnally,  a  timing  model  for  handling  different  time  references  in  rapid  prototyping 
systems  is  introduced,  to  support  the  reuse  of  real-time  components. 
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INTRODUCTION  TO  HARD  REAL-TIME  SYSTEMS 


A.  INTRODUCTION 

Traditionally,  most  real-time  systems  have  been  built  for  military  purposes.  As 
computers  become  faster,  more  inexpensive,  and  more  reliable,  a  tendency  towards 
automation  is  ernwging  in  virtually  every  field  of  activity.  Areas  in  which  real  time 
systems  are  being  more  widely  employed  include  manufacturing,  communications, 
defense,  transportation,  aerospace,  energy,  and  health  care. 

“Hard  real-time  systems”  are  defined  as  those  systems  in  which  the  correctness  of 
the  system  depends  not  only  on  the  logical  results  of  computation,  but  also  on  the  time  at 
which  the  results  are  produced.  They  are  also  characterized  by  the  fact  that  severe 
consequences  will  result  if  logical  as  well  as  timing  correcmess  properties  of  the  system 
are  not  satisfied.  [SR88] 

To  put  it  briefly,  real-time  systems  differ  fix)m  traditional  systems  in  that  deadlines 
or  other  explicit  timing  constraints  are  attached  to  the  tasks  or  processes. 

Audsley  and  Bums  presented  a  very  interesting  approach  [AB93],  where  the  time 
taken  to  complete  a  task  is  mapped  against  the  value  this  task  has  to  the  system, 
developing  the  so  called  time-value  functions.  This  work  proposes  an  adaptation  of  then- 
approach  to  be  used  by  CAPS,  where  the  time  critical  tasks  could  have  several  kinds  of 
deadlines,  as  shown  in  Figure  1.1.  Tasks  with  hard  deadlines  may  cause  damage  to  the 
system  if  they  start  early  or  finish  late.  Tasks  with  soft  deadlines  convey  the  main  idea  of 
“better  late  than  never”,  and  the  tasks  with  hybrid  deadlines  can  be  assumed  to  have  a  soft 
deadline  behavior  until  certain  point  in  time,  but  then  they  become  hard  deadline  tasks, 
generating  damage  to  the  systent  Using  this  approach,  it  is  posable  to  determine  whether 
it  is  more  convenient  to  precnqit  a  task  that  has  not  fiiushed  within  its  deadline  or  keep  it 
running.  This  approach  provides  a  much  better  representation  for  a  task  deadline,  than 
tiiat  achieved  by  merely  calling  it  a  soft  or  a  hard  deadline. 


1 


In  general  it  can  be  said  that  there  are  three  types  of  tasks,  depending  upon  their 
deadline  characterisdcs.  The  periodic  tasks  that  execute  on  a  regular  basis,  and  usually 
have  a  period  and  a  required  execution  time.  The  aperiodic  tasks  (also  known  as  non- 
paiodic)  which  are  essentially  random  tasks  triggered  by  some  external  event.  Aperiodic 
tasks  may  also  have  some  timing  constraints  that  limit  their  maximum  start  or  finish  time. 
However,  if  aperiodic  tasks  are  allowed  to  have  hard  deadlines  (in  other  words,  if  they  are 
allowed  to  have  negative  values  once  the  deadline  is  missed)  worst  case  analysis  cannot  be 
further  discussed  without  further  restricting  their  timing  behavior.  This  is  the  rationale 
behind  the  third  type  of  task,  the  sporadic  task,  in  which  a  minimum  period  between  any 
two  aperiodic  events  is  required.  [AB93] 


Figure  1.1.  Types  of  Task  Deadlines 

In  addition  to  timing  constraints,  a  task  can  have  other  constraints,  such  as  [SR88]: 

1)  resource  constraints  ~  which  note  the  resources  required  during  the  execution 
of  the  task 

2)  precedence  constraints  -  that  specify  a  partial  (perhaps  total)  ordering  on  the 
execution  of  the  tasks 

3)  concurrency  constraints  -  that  describe  which  tasks  can  run  concurrently,  to 
share,  for  example,  a  resource 
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4)  placement  constraints  •  which  note  whether  a  given  task  is  to  run  in  a  specific 
processor 

5)  criticalness  -  which  is  the  relative  value  to  the  system  that  is  associated  with 
some  specific  task  when  it  meets  its  deadline 

6)  preemptiveness  -  determining  whether  a  task  can  be  interrupted  by  other  tasks 
and  resume  execution  afterwards 

7)  communication  requirements  -  that  note  issues,  such  as  acceptable  delays,  for 
inter-task  communications  and  synchronization  protocols 

Task  scheduling  in  hard  real-time  systems  can  be  either  static  or  dynamic.  In  static 
scheduling  it  is  assumed  that  all  information  about  the  tasks  is  known  a  priori,  and  the 
schedule  is  usually  generated  off-line.  In  dynamic  scheduling,  although  all  information 
about  the  tasks  may  be  known  a  priori,  they  are  allowed  to  be  dynamically  invoked,  and 
the  schedule  is  calculated  “on  the  fly”.  There  has  been  a  great  deal  of  debate  about  the 
appropriateness  of  dynamic  algorithms  for  hard  real-time  systems.  Many  people  are  in 
favor  of  static  scheduling  because  it  seems  reasonable  to  assume  that  for  safety-critical 
applications  all  the  schedulability  should  be  guaranteed  before  execution  [AB93]. 

B.  REVIEW  OF  PREVIOUS  WORK 

According  to  Baker  [Bak74],  scheduling  is  the  allocation  of  resources  over  time  to 
perform  a  collection  of  tasks.  This  rather  general  definition  conveys  the  basic  idea  of 
scheduling  theory,  which  is  a  collection  of  principles,  models,  techniques  and  logical 
conclusions  that  provide  insist  into  the  scheduling  function. 

Many  of  the  early  developments  in  the  field  of  scheduling  were  motivated  by 
problems  arising  in  manufacturing.  Today,  even  though  scheduling  is  used  in  many 
different  areas,  there  arc  still  references  that  deal  with  machines  instead  of  processors,  and 
with  jobs  instead  of  tasks. 

In  order  to  have  a  better  understanding  of  the  context  in  which  scheduling  issues 
are  found,  it  is  reasonable  to  begin  by  proposing  a  taxonomy  for  the  scheduling  function. 


3 


This  taxonomy  is  an  enhancement  of  that  proposed  by  Cheng,  et  al.  [CSR87]  and  is 
illustrated  in  Figure  1.2. 

As  shown  in  the  figure,  classical  scheduling  can  be  divided  into  four  major  areas: 
single-machine  problems,  parallel-machine,  flow  shop,  and  job  shop  scheduling.  Most  of 
these  areas  make  use  of  objective  functions,  such  as  minimizing  flowtime,  minimising 
mean  tardiness,  and  minimizing  con^letion  time  (makespan),  which  does  not  convey  much 
of  the  important  infonnation  needed  by  real-time  systems.  In  most  of  these  problem  areas, 
the  deadline  concept  is  not  even  considered.  Nevatheless,  some  of  these  results  can 
provide  very  fruitful  insights  into  real-time  scheduling  problems.  Another  issue  that  is  not 
considered  in  many  of  the  problems  associated  with  classical  scheduling  is  the  idea  of 
periodic  tasks,  meaning  tasks  that  run  forever.  For  further  reading  on  classical  scheduling 
the  reader  is  directed  to  the  work  of  Baker  [Bak74]  and  Stankovic,  et  al.  [SSN93].  The 
latter  reference  presents  a  concise  survey  on  the  implications  of  classical  scheduling  results 
for  real-time  systems. 


Figure  1.2.  Scheduling  Taxonomy 


Tasks  can  also  be  distinguished  as  preemptable  or  non-prcemptablc.  A  task  is 
preemptable  if  it  can  be  interrupted  by  other  tasks  and  can  resume  execution  afterwards. 
A  non-piecmptable  task,  once  started,  must  run  to  completion. 
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Another  concept  that  requires  introduction  is  the  difference  between 
multiprocessor  systems  and  distributed  systems.  In  multiprocessor  systems,  the  cost  of 
inteipiocessor  communications  is  negligible,  as  the  different  processors  usually  have  some 
kind  of  shared  memory  and  a  global  clock.  In  distributed  systems,  the  cost  of 
interprocessor  communications  is  not  negligible,  as  the  processors  do  not  share  any 
memory  space  and  each  processor  has  its  own  clock.  It  is  now  appropriate  to  make  a  brief 
review  of  some  previous  work  done  in  hard  real-time  scheduling,  with  an  emphasis  on  the 
results  related  to  static  scheduling. 

1.  Preemptive  Static  Scheduling 

In  cases  where  the  tasks  are  periodic,  which  is  the  most  common  case  in 

real-time  systems,  it  can  be  said  that  the  most  important  result  for  the  uniprocessor  case 
was  provided  by  Liu  and  Layland  [LL73].  They  proved  that  the  Earliest  DeadUne  First 
(EDF)  algorithm  is  optimal  for  any  set  of  independent  periodic  tasks  where  optimality  is 
defined  by  the  statement,  “if  a  set  of  tasks  can  be  scheduled  by  any  algorithm,  then  it  can 
be  scheduled  by  the  EDF  algorithm”.  They  also  demonstrated  some  bounds  on  processor 
utilization  when  using  this  algorithm.  Their  results  were  extended  to  cover  cases  where 
the  release  times  are  arbitrary  by  Jeffay  lJef89a].  Also  based  on  Liu  and  Layland  s  work, 
a  more  elaborate  schedulability  test  was  proposed  by  Lchoczky,  ct  al.  ILSD89].  This  test 
employed  the  concept  of  processor  time  demand  for  handling  cases  where  the  deadlines 
were  smaller  than  the  periods.  Sha  and  Lehoezky  ILS86]  described  a  technique  of 
splitting  the  periods  so  that  better  processor  utilization  could  be  achieved. 

Horn  IHor741  developed  an  optimal  0(n*)  algorithm  that  was  also  based 
on  the  earliest  deadline  first  principle.  Originally  formulated  for  non-periodic  tasks,  this 
algorithm  proved  capable  of  handling  independent  tasks  with  arbitrary  deadlines  and 
release  times  in  a  uniprocessor  environment  For  the  same  type  of  tasks,  he  also 
introduced  an  algorithm  for  the  multiprocessor  case  that  was  based  on  the  network  flow 
method.  Manel  IMar821  extended  the  work  of  Horn  by  allowing  for  processors  with 
different  speeds. 
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For  multiprocessor  scheduling  of  periodic  tasks,  most  of  researchers  have 
adopted  a  partition  approach,  where  some  kind  of  bin-packing  algorithm  is  used  to 

determine  the  sub-optimal  partitions.  Examples  can  be  found  in  the  work  of  Davari  and 
Dhall  [DD86],  Bannister  and  Trivedi  [BT83],  and  in  that  of  Dhall  and  Liu  [DL78]. 

2.  Non-Preemptive  Static  Scheduling 

There  has  been  a  great  deal  of  research  in  the  area  of  preemptive  real-time 
scheduling.  For  the  non-preemptive  case,  however,  most  problems  have  been  shown  to  be 
NP-hard,  even  in  the  uniprocessor  case.  Hence,  the  majority  of  the  work  that  has  been 
done  in  this  area  covers  very  specific  cases,  such  as  when  unit  computation  times  are 
involved,  or  when  release  times  are  the  same.  Moore  [M0068]  showed  that  the  earliest 
deadline  algorithm  is  optimal  for  scheduling  a  set  of  independent  tasks  that  have  the  same 
release  time.  Bratley,  Florian  and  Robillard  [BFR71]  developed  an  implicit  enumeration 
algorithm  to  determine  scheduling  for  non-preemptive  tasks  with  arbitrary  release  times 
and  deadlines.  Baker  mid  Su  [BS74]  used  a  similar  approach  to  minimize  the  maximum 
tardiness  of  tasks.  Erschler,  ct  al.  [EFM83]  developed  a  necessary  condition  for 
scheduling  tasks  with  arbitrary  release  times  and  deadlines.  When  utilizing  periodic  task 
sets,  which  are  definitely  the  major  area  of  focus  for  this  study,  the  major  results  can  be 
found  in  the  work  of  Mok  [Mok83J,  Xu  IXP90],  Jeffay  [JSM91]  and  Zhu  [ZLC94]. 

3.  Summary  of  Scheduling  Complexity 

In  dealing  with  scheduling  problems  where  most  of  the  input  instances  have  been 
proven  to  be  NP-hard,  it  is  very  important  and  beneficial  to  know  in  which  class  a 
particular  instance  belongs,  so  that  the  problem  can  be  addressed  appropriately.  However, 
when  one  looks  into  the  huge  amount  of  research  in  this  area,  it  becomes  apparent  that  the 
various  studies  are  very  difficult  to  compare.  While  it  is  undesirable  to  liirut  the  creativity 
of  researchers,  it  is  increasingly  apparent  that  some  kind  of  standard  is  needed,  so  that 
individual  research  efforts  at  least  speak  in  the  same  language. 
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Nevertheless,  this  section  offers  a  summary  of  the  major  results  achieved  in  the 
area  of  time  complexity  of  scheduling  algorithms,  for  both  the  preemptive  and  non- 
preemptive  cases.  Whenever  the  result  is  applicable  to  penodic  task  sets,  it  will  be  briefly 
mentioned. 

In  Table  1.1,  it  has  been  listed,  for  each  case,  the  number  of  processors  (m),  the 
precedence  relation  (<)  among  the  tasks  (if  one  exists),  the  valid  domain  for  the  release 
time  (rf),  the  deadline  (d,),  the  computation  time  (c.),  whether  it  is  preemptive  or  non- 
preeirq)tive,  the  timp.  complexity  of  the  problem,  the  reference  paper,  and,  finally,  some 
additional  remarks.  Note  that  in  this  table  most  of  the  results  are  for  non-periodic  task 
sets.  In  the  following  section,  the  problem  of  how  to  apply  these  results  to  the  periodic 
case  is  addressed. 
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Table  1.1.  Major  Results  in  Scheduling  Algorithms 


Table  1.2  summarizes  the  complexity  boundaries  of  various  non-prcemptive 
problems  with  respect  to  the  number  of  processors,  computation  time,  and  type  of  partial 
order. 
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Table  1.2.  Summary  of  Non-Preeraptive  Scheduling  Complexity 


Table  1.3  is  very  interesting  in  the  sense  that  it  delimits  the  boundaries  between 
NP-completeness  and  polynomial  solvability  for  the  more  constrained  non-preemptive 
scheduling  problem,  where  resources  (Rsrc)  other  than  processors  are  being  requested  by 
the  tasks.  As  can  be  seen,  by  having  no  precedence  relations,  or  for  values  of  m  less  than 
2  in  the  first  case,  or  by  making  m  less  than  three  in  the  second  case,  the  resulting 
problems  can  be  solved  in  polynomial  time.  [GJ75] 
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Table  1.3.  Complexity  of  the  Scheduling  Problem  with  Several  Resources 


Other  important  results  are: 

“It  is  impossible  to  find  a  totally  qjtimal  run-time  scheduler  even  if 
any  ready  process  is  permitted  to  preempt  any  other  process  in 
progress“.[Mok7  6] 

"When  there  are  mutual  exclusion  constraints,  it  is  impossible  to 
find  a  totally  on-line  optimal  run-time  scheduler”.[Mok83] 
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“The  problem  of  deciding  whether  it  is  possible  to  schedule  a  set  of 
periodic  processes  which  use  semaphores  only  to  enforce  mutual  exclusion 
isNP-hard”.[Mok83] 

“The  problem  of  computing  a  static  schedule  for  a  set  of  periodic 
timing  constraints  is  NP-haid”.[Mok83] 

“Non-preemptive  scheduling  of  periodic  tasks  when  release  times 
are  taken  into  consideration  is  NP-hard  in  the  strong  sense”.[JSM91] 

“The  processor  allocation  problem  is  NP-complete  even  for  the 
case  whae  only  two  processors  are  available  and  the  processor  scheduling 
problem  resulting  from  any  partition  is  easy”. [Mok83] 

“The  problem  of  finding  an  optimal  schedule  is  NP-hard  for  a  single 
processor  even  if  all  tasks  have  the  same  ready  time  and  deadline”.[LW90] 

4.  A  Brief  Note  about  the  Periodic  Task  Complexity 

It  is  very  common  for  authors  of  papa’s  that  deal  with  the  scheduling  of  non¬ 
periodic  tasks,  i.e.,  tasks  that  are  executed  only  once,  to  infer  that  their  algorithms  or 
methods  can  also  be  applicable  to  periodic  tasks  by  sinply  applying  the  same  algorithm  to 
the  set  of  tasks  occurring  within  a  time  period  that  is  equal  to  the  least  common  multiple 
of  their  periods. 

Although  this  assertion  is  true  in  most  of  cases,  one  must  note  that  a  polynomial 
time  algorithm  for  scheduling  non-periodic  tasks  may  take  exponential  time  to  schedule  a 
set  of  periodic  tasks  using  the  same  algorithm.  To  see  this,  consider  an  algcnithm  A  that 
schedules  a  set  T  of  n  non-periodic  tasks  in  time  0(1 1  P),  where  1 1 1  is  equal  to  the  size  of 
die  input  instance.  Qearly,  by  using  a  binary  encoding,  0(  n  +  Hog  ri  +  Hog  Cj  Hog  di) 
bits  arc  needed  to  encode  such  an  instance.  Now,  assume  a  set  Tof  n  periodic  tasks  with 
periods  p,.  p2, ... ,  p.,  whose  input  size  is  0(  n  h-  Hog  r,  +  Hog  Cj  +  Hog  d;  +  Hog  p. ). 
Note  that  in  the  worst  case  an  LCM  of  pi  x  p2  x  ...  x  p„  exists.  So,  in  order  to  use 
algorithm  A  to  schedule  the  periodic  task  set  T,  one  must  first  transfonn  T  into  an 
equivalent  set  T"  of  non-periodic  tasks  with  p2X  p3...x  p„  instances  of  task  Ti ,  pix  p3...x 
p,  instances  of  task  T2 ,  pjx  p2...x  p„  instances  of  task  T3 ,  and  so  on. 
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Clearly,  the  size  IF'I  of  the  input  instance  T'  is  equal  to 
0(  n  +  2  [( log  li  +  log  Ci  +  log  di)  X 

i=i  p. 

and  algorithm  A  will  take  OCin^)  time  to  schedule  all  task  instances  in  T".  But,  since  II"I 

n 

^  C  X  ( [  n  +  2  ( log  Ti  +  log  Ci  +  log  di  +  log  Pi )  ]  ^ )  for  any  constants  C  and  k,  0(11"!^) 
is  exponential  with  respect  to  H'l. 

5.  Complexity  Results  for  Message  Routing  in  Distributed  Systems 

This  section  presents  some  very  interesting  results  from  Leung  [LTW89]  regarding 
the  possibility  or  impossibility  of  sending  a  set  of  messages  in  a  distributed  real-time 
system  on-time.  Each  message  M  is  represented  by  the  quintuple  (Si,ei4i,ri,di)  where  Si 
denotes  the  origin  node  for  Mj,  ei  denotes  the  destination  node,  1,  is  the  length  of  Mi,  rj  is 
the  release  time,  and  di  denotes  the  deadline  of  Mi.  The  problem  was  studied  for  both 
preemptive  and  non-preemptive  cases,  but  this  discussion  will  be  restricted  to  the  latter.  It 
is  also  assumed  that  the  processors  are  connected  by  an  uni-directional  ring.  Table  1.4 
shows  the  complexity  results  for  the  non-preemptive  transmission.  An  entry  marked  k 
denotes  that  the  parameter  is  the  same  for  all  messages,  while  a  V  entry  denotes  that  it  can 
vary  according  to  the  message. 
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Table  1.4.  Complexity  for  Non-Preemptive  Transmissions 
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As  shown  in  Table  1.4,  the  message  routing  problem  becomes  NP  whenever  two 
or  more  parameters  are  allowed  to  be  arbitrary.  These  and  other  results  had  a  great 
influence  on  the  manner  in  which  this  dissertation  will  treat  distributed  scheduling. 
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n.  CAPS  AND  PSDL  OVERVIEW 


A.  MOTIVATION 

The  United  States  Department  of  Defense  (DoD)  is  currently  the  world’s  largest 
user  of  computers.  Each  year,  billions  of  dollars  are  allocated  for  the  development  and 
maintenance  of  progressively  more  conq)lex  weapons  and  communications,  and 
information  systems.  These  systems  increasingly  rely  on  information  processing,  utilizing 
embedded  computer  systems,  and  are  often  characterized  by  time  periods  or  deadlines 
within  which  some  event  must  occur.  Such  periods  or  deadlines  are  known  as  “hard  real¬ 
time  constraints”.  Satellite  control  systems,  missile  guidance  systems,  and  commuiucations 
networks  are  examples  of  embedded  systems  with  hard  real-time  constraints.  The 
correcmess  and  reliability  of  these  software  systems  is  critical,  making  software 
development  of  these  systems  an  immense  task  with  increasingly  high  costs  and  potential 
for  design  errors  [Boo87]. 

Over  the  past  twenQr  years,  technological  advances  in  computer  hardware 
technology  have  reduced  the  hardware  portion  of  total  system  cost  from  85  percent  to 
about  15  percent.  In  the  early  1970s,  studies  showed  that  computer  software  alone 
comprised  approximately  46  percent  of  the  total  estimated  DoD  computer  costs.  Of  this 
cost,  56  percent  was  devoted  specifically  to  embedded  systems.  In  spite  of  the 
tremendous  expense,  most  large  software  systems  were  characterized  as  not  providing  the 
functionality  that  was  desired,  taking  too  long  to  develop,  costing  too  much  time  or  taking 
too  much  space  to  use,  and  lacking  the  atnlity  to  evolve  to  meet  the  user's  changing  needs 
[Boo87]. 

Software  engineering  evolved  in  response  to  the  need  to  more  efficiently  design, 
iiiq)lement.  test,  install,  and  maintain  larger  and  more  complex  software  systems.  The 
term  “software  engineering”  was  coined  in  1967  1^  a  NATO  study  group,  and  endorsed 
the  1968  NATO  Software  Engineering  Conference  [Sch90].  The  conference 
concluded  that  software  engineering  should  use  the  philosophies  and  paradigms  of 
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traditional  engineering  disciplines.  Numerous  methodologies  have  been  introduced  to 
support  software  engineering.  The  major  approaches  which  underlie  these  different 
methodologies  are  the  waterfall  model  [Lam88],  the  spiral  model  [Boe86],  and  the 
prototyping  methods  of  development  [Luq89]. 

B.  THE  WATERFALL  MODEL 

The  waterfall  model  describes  a  sequential  approach  to  software  development  as 
shown  in  Figure  2.1.  The  requirements  are  cotr^iletely  determined  before  the  system  is 
designed,  implemented  and  tested.  The  cost  of  systems  developed  using  this  model  is  very 
high.  Required  modifications  that  are  realized  late  in  the  development  of  a  system,  such  as 
during  the  testing  phase,  have  a  much  greater  impact  on  the  cost  of  the  system  than  they 
would  have  if  they  had  been  determined  during  the  requirements  analysis  stage  of 
development.  Requirements  analysis  may  be  considered  the  most  critical  stage  of  software 
development,  since  this  is  when  the  system  is  defined. 
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Figure  2.1.  The  Waterfall  Model 


Requirements  are  often  incompletely  or  erroneously  specified,  due  to  the  often  vast 
difference  in  the  technical  backgrounds  of  the  user  and  the  analyst.  It  is  often  the  case  that 
the  user  understands  his  application  area  but  does  not  have  the  technical  background  to 
communicate  his  needs  to  the  analyst,  while  the  analyst  is  not  familiar  enough  with  the 
application  to  detect  a  misunderstanding  between  himself  and  the  user.  The  successful 
development  of  a  software  system  is  strictly  dependent  upon  this  process.  The  analyst 
must  understand  the  needs  and  desires  of  the  user  and  the  performance  constraints  of  the 
intended  software  system  in  order  to  specify  a  complete  and  correct  software  system. 

Requirements  specifications  are  still  most  widely  written  using  the  English 
language,  which  is  an  ambiguous  and  non-specific  mode  of  communication. 

Another  difiBculty  of  the  classical  life  cycle  is  that  communication  between  a 
software  development  team  and  the  customer  or  the  system's  users  is  weak.  Most  of  the 
time  the  customer  does  not  know  what  he  or  she  wants.  In  that  case  it  is  hard  to 
determine  the  exact  requirements,  since  the  software  developer  is  also  unfamiliar  with  the 
problem  domain  of  the  system.  Formal  specification  languages  are  used  to  formalize 
customer  needs  to  a  certain  extent  Another  disadvantage  of  the  classical  project  life  cycle 
is  that  a  working  model  of  the  software  system  is  not  available  until  late  in  the  project  time 
^an.  This  may  cause  two  things: 

1)  A  major  bug  that  remains  undetected  until  the  working  program  is  reviewed, 
which  can  be  disastrous  [Pre87]; 

2)  The  customer  will  not  a  have  an  idea  of  what  the  system  will  look  like  until  it  is 
complete. 

C  THE  SPIRAL  MODEL 

Large  real-time  systems  and  systems  which  have  hard  real-time  constraints  are  not 
well  supported  by  traditional  software  development  methods  because  the  designer  of  this 
type  of  system  would  not  know  if  the  system  can  be  built  with  the  timing  and  control 
constraints  required  until  after  much  time  and  effon  has  been  spent  on  implementation.  A 
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hard  real-time  constraint  imposes  a  time-bound  on  the  response  time  of  a  process  which 
must  be  satisfied  under  all  operating  conditions. 

To  solve  the  problems  raised  in  requirements  analysis  for  large,  parallel, 
distributed,  real-time,  or  knowledge-based  systems,  current  research  suggests  an 
alternative  paradigm  for  software  development  and  evolution  based  on  rapid  prototyping 
[LB  88].  The  purpose  of  prototyping  is  to  ensure  that  proposed  requirements  and  ^stem 
concepts  adequately  match  the  needs  of  the  prospective  client(s)  before  detailed 
optimization  and  in^lementation  efforts  begin.  As  a  software  methodology,  rapid 
prototyping  provides  the  user  with  increasingly  refined  systems  to  test  and  the  designer 
with  ever  better  user  feedback  between  each  refinement  The  result  is  more  user 
involvement  throughout  the  development/specification  process,  and  consequently,  better 
engineered  software. 

The  prototyping  method  shown  in  Figure  2.2  has  recently  become  popular.  “It  is  a 
method  for  extracting,  presenting,  and  refining  a  user's  needs  by  building  a  working  model 
of  the  ultimate  system  —  quickly  and  in  context”  [Boa84].  This  approach  captures  an 
initial  set  of  needs,  and  quickly  implements  those  needs  with  the  stated  intent  of  iteratively 
expanding  and  refining  them  as  the  user's  and  designer's  understanding  of  the  system 
grows.  The  prototype  is  only  to  be  used  to  model  the  system's  requirements,  rather  than 
as  an  operational  system  [You89]. 
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Figure  2.2.  The  Prototyping  Process 


This  iterative  prototyping  process  is  also  known  as  the  “Spiral  Model  of  Software 
Development”  and  is  illustrated  in  Figure  2.3.  In  the  prototyping  cycle,  the  system 
designer  and  the  user  work  together  at  the  beginning  of  the  project  to  determine  the 
critical  parts  of  the  proposed  system  The  designer  then  implements  a  prototype  of  the 
system  based  on  these  critical  requirements  by  using  a  prototype  description  language 
[Luq89].  The  resulting  system  is  presented  to  the  user  for  evaluation.  Diuing  these 
demonstrations,  the  user  determines  whether  the  prototype  behaves  as  it  is  supposed  to 
do,  examines  user  interface  options,  and,  most  importantly,  verifies  understanding  of  the 
problem  and  solution.  If  errors  arc  found  at  this  point,  the  user  and  the  designer  work 
together  again  on  the  specified  requirements  to  correct  them  Concurrently,  a  risk  analysis 
is  initiated  to  decide  whether  or  not  to  move  on  to  the  next  cjrcle  of  the  spiral  This 
process  continues  until  the  user  determines  that  the  prototype  successfully  captures  the 
critical  aspects  of  the  proposed  system.  This  is  the  point  where  precision  and  accuracy  are 
obtained  for  the  proposed  system  The  designer  then  uses  the  prototype  as  a  basis  for 
designing  the  production  software. 
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Some  advantages  and  disadvantages  of  iterative  development  methodology  are 
listed  below: 

Advantages: 

1)  There  is  constant  customer  involvement  (revising  requirements). 

2)  Software  development  time  is  gready  reduced. 

3)  Methodology  maps  to  reality. 

4)  It  allows  use  of  off-the-shelf  tools. 

Disadvantages: 

1)  There  are  configuration  control  complexities. 

2)  The  developer  is  compelled  to  manage  customer  enthusiasm. 

3)  There  are  uncertainties  in  contracting  the  iterative  development 

ManuaUy  construction  of  the  prototype  still  takes  too  much  time,  and  can 
introduce  many  enors.  Also,  it  may  not  accurately  reflect  the  timing  constraints  placed 
upon  the  system.  What  is  needed  is  an  automated  method  of  r£q)idly  prototyping  a  hard 


18 


real-time  system  that  reflects  those  constraints  and  requires  minimal  development  time. 
Such  a  system  should  exploit  reusable  components  and  validate  timing  constraints. 

If  Ada  software  that  is  reliable,  affordable,  and  adaptable  is  to  be  produced  and 
maintained,  the  characteristics  of  Ada  may  not  be  the  only  important  matter  to  consider,  as 
the  characteristics  of  Ada  software  development  environments  may  well  be  oitical 
[BL91]. 

The  rapid,  itraative  construction  of  prototypes  within  a  computer  aided 
environment  automates  the  prototyping  method  of  software  development,  and  is  called 
rapid  prototyping.  Rapid  prototyping  provides  an  efficient  and  precise  means  to  determine 
the  requirements  for  the  software  system,  and  greatly  improves  the  likelihood  that  the 
software  system  developed  from  the  requirements  will  be  complete,  correct,  and 
satisfactory  to  the  user.  The  potential  benefits  of  prototyping  depend  critically  on  the 
ability  to  modify  the  behavior  of  the  prototype  with  less  effort  than  that  required  to  modify 
the  production  software.  Computer  aided  and  object-based  rapid  prototyping  provides  a 
solution  to  this  problem. 

D.  THE  COMPUTER  AmED  PROTOTYPING  SYSTEM  (CAPS) 

The  Computer-Aided  Prototyping  System  (CAPS)  [LK88]  is  a  software 
engineering  tool  for  developing  prototypes  of  real-time  systems.  It  is  useful  for 
requirements  analysis,  feasibility  studies,  and  the  design  of  large  embedded  systems. 
CAPS  is  based  on  the  Prototype  System  Description  Language  (PSDL)  [LBY88],  which 
provides  facilities  for  modeling  timing  and  control  constraints  within  a  software  system 
An  overview  of  PSDL  will  be  presented  in  the  following  section.  CAPS  is  a  development 
environment,  implemented  in  the  form  of  an  integrated  collection  of  tools,  linked  together 
by  a  user-interface,  and  provides  the  following  kinds  of  support  to  the  prototype  designer: 

•  timing  feasibility  checking  via  the  scheduler, 

•  consistency  checking  and  some  automated  assistance  for  project  planning, 
scheduling,  designer  task  assignment,  and  project  completion  date  estimation 
via  the  Evolution  (Control  System, 
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•  design  completion  via  the  editors, 

•  computer-aided  software  reuse  via  the  software  base. 

A  CAPS  prototype  is  initially  built  as  an  augmented  data  flow  diagram  and  a 
corresponding  PSDL  program.  The  CAPS  data  flow  diagram  and  PSDL  program  are 
augmented  with  timing  and  control  constraint  information,  which  is  used  to  model  the 
functional  and  real-time  aspects  of  the  prototype.  The  CAPS  environment  provides  all  of 
the  necessary  tools  for  engineers  to  quickly  develop,  analyze,  and  refine  real-time  software 
systems. 

The  general  structure  of  CAPS  is  shown  in  Figure  2.4.  The  CAPS  User-Interface 
provides  access  to  all  of  the  CAPS  tools,  and  facilitates  communication  between  tools 
when  necessary.  The  tools  in  Figure  2.4  are  grouped  into  four  sections:  Editors, 
Execution  Support,  Project  Control  and  Software  Base.  Each  CAPS  tool  is  associated 
with  a  different  aspect  of  the  CAPS  prototyping  process. 


CAPS  is  specifically  designed  to  assist  and  partiaUy  automate  development  efforts 
which  lie  in  the  shaded  regions  of  the  prototyping  process  (Figure  2.2).  Specifically,  based 
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on  a  set  of  initial  requirements,  CAPS  allows  the  engineer  to  design,  modify,  demonstrate 
and  validate  a  software  system.  Through  this  process,  system  requirements  can  be  refined 
and  modified  as  necessary. 

The  CAPS  prototyping  process  is  more  specific,  and  it  could  be  said  that  it  is  a 
refinement  of  what  is  shown  in  Figure  2.2,  and  is  outlined  below.  [Bro94] 

1)  Based  on  requirements,  design  (or  modify)  the  data  flow  diagram  for  the  system 

2)  Assign  all  appropriate  timing  and  control  constraints  to  the  protoQrpe  operators. 
Assign  latencies  to  data  streams  (if  required) 

3)  Assign  data  types  to  all  data  streams 

4)  Find  (in  the  software  base)  or  build  an  implementation  module  for  each  user- 
defined  data  type  and  each  atomic  operator.  Modules  taken  fiom  the  software 
base  can  be  modified  after  retrieval  to  suit  individual  needs 

5)  Build  the  prototype's  user-interface  (if  required) 

6)  Translate  the  CAPS-generated  (and  user-augmented)  PSDL  program  into  (a 
portion  of)  the  Ada  supervisor  module 

7)  Run  the  CAPS  scheduler  to  generate  the  static  and  dynamic  schedules.  This 
conqrletes  the  prototype's  Ada  supervisor  module 

8)  Conqjile  the  prototype.  (Note:  for  successful  compilation,  particular  attention 
must  be  paid  to  the  formal  parameters  of  atomic  operator  implementation 
procedures  created  in  step  4) 

9)  Execute,  evaluate  and  modify  (if  appropriate)  the  prototype  and/or  the 
requirements 

10) Retum  to  Step  1  if  prototype  naodification  is  required 

The  correlation  between  these  10  steps  and  Hgure  2.2  is  obvious.  Note  that  the 
basic  10  steps  are  a  bit  more  detailed  than  the  preceding  prototyping  process  diagram. 
This  highlights  the  real-time  requirements,  and  associated  design  considerations  of  typical 
CAPS  prototypes. 
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The  remainder  of  this  introduction  briefly  introduces  the  CAPS  tools  used  to 
perform  the  basic  10  steps.  Note,  also,  that  two  of  the  CAPS  tools  are  outside  the 
purview  of  the  prototyping  process  diagram.  These  tools  perform  ancillary  functions 
which  are  not  seen  in  either  the  prototyping  process  diagram  or  the  10  basic  CAPS  steps. 
These  advanced  feature  tools  are  the  Evolution  Control  System  and  the  Merger. 

The  purpose  of  the  Evolution  Control  System  is  to  provide  automated  support  for 
coordinating  the  concurrent  efforts  of  a  team  of  prototype  designers,  and  to  manage 
multiple  versions  of  the  designs  they  produce  [Bad93].  The  purpose  of  the  Merger  is  to 
combine  the  effects  of  two  or  more  enhancements  to  a  prototype  that  have  been 
independendy  developed  [Dam94]. 

CAPS  can  be  executed  in  either  the  designer  mode  or  the  manager  mode.  The 
manager  mode  provides  access  to  CAPS  advanced  features,  including  modification  of  the 
designer  pool,  creation  of  project  work  steps,  and  prototype  change-merging.  CAPS 
supports  distributed  prototype  development,  and  the  manager  interface  provides  facilities 
for  such  efforts.  For  simple,  single-designer  prototype  building,  the  designer  mode  should 
be  used. 

1.  CAPS  Tools 

This  section  provides  a  brief  description  of  each  CAPS  tool. 

a.  The  PSDL  Editor 

The  PSDL  Editor  is  the  heart  of  CAPS  prototype  design.  This  editor 
% 

consists  of  3  separate  parts:  the  Syntax  Directed  Editor,  the  Graph  Viewer,  and  the 
Graphic  Editor.  This  tool  allows  the  designer  to  create  the  CAPS  data  flow  diagram  and 
the  PSDL  program,  and  assign  all  timing  and  control  constraints  to  prototype  components 
(operators  and  data  streams). 

b.  The  Text  Editor 

Although  the  text  editor  is  not  exclusively  a  CAPS  tool,  CAPS  does 
provide  fluid  integration  of  text  editing  facilities.  Designers  can  select  from  vi,  emacs  and 
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the  Verdix  Ada  Syntax  Directed  Editor  (if  available)  for  editing  Ada  programs.  Use  the 
“CAPS  Defaults”  selection  under  the  “CAPS  Edit”  pull-down  menu  to  make  this 
selection.  The  CAPS  User-Interface  provides  convenient  file  selection  Usts,  based  on  the 
currently  selected  prototype. 

c.  The  Inteiface  Editor 

CAPS  integrates  TAE+  [Tae93]  for  creation  of  window-based  user- 
interfaces  for  prototypes.  When  using  the  TAE  Workbench  for  creation  of  such  user- 
interfaces,  the  designer  must  use  the  “single  file”  Ada  code  generation  option  fix)m  within 
TAE+.  The  automatically  generated  TAE  code  is  placed  in  the  prototype  directory  in  a 
file  called 

<prototype_name>.RAW_TAE_INTERFACE.a. 

For  details  about  how  to  integrate  this  file  into  a  prototype,  see  Chapter 
Vn  of  the  CAPS  Tutorial  by  Brockett  [Bro94]. 

d.  The  Requirements  Editor 

The  current  version  of  CAPS  does  not  have  a  sophisticated  requirements 
tracking  or  editing  tool.  Single  text  editor  integration  is  provided  for  editing 
requirements  documents  associated  with  a  prototype.  CAPS  will  automatically  present 
the  user  with  a  list  of  all  files  with  a  “ jeq”  sufBx  when  “Requirements”  is  selected  from 
the  “Edit”  pull-down  menu.  After  a  file  is  selected,  the  default  text  editor  will  be  invoked 
on  that  file. 

e.  The  Change  Request  Editor 

As  with  lequiiements.  the  current  version  of  CAPS  docs  not  have  a 
sophisticated  change  request  tracking  or  editing  tool.  Simple  text  editor  integration  is 
provided  for  editing  change  request  documents  associated  with  a  prototype.  CAPS  will 
automatically  present  the  user  with  a  list  of  all  files  with  a  ‘‘.cr^’  suffix  when  “Change 
Request”  is  selected  from  the  “Edit”  pull-down  menu.  After  a  file  is  selected,  the  default 
text  editor  will  be  invoked  on  that  file. 
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/.  The  Translator 

The  CAPS  translator  converts  a  PSDL  program  into  compilable  Ada 
packages  which  implement  supervisory  aspects  of  the  prototype.  The  translator  expects  a 
complete  PSDL  program  as  input,  and  creates  several  packages  which  make  up,  in  part, 
the  supervisor  module  of  the  prototype.  It  is  important  to  note  that  the  translator  does  not 
create  Ada  iirq)lementation  packages  for  atomic  operators  or  user-defined  data  types. 
These  must  be  either  extracted  from  the  software  base,  or  custom-made  by  the  designer. 

g.  The  Scheduler 

The  scheduler  determines  schedule  feasibility  for  CAPS  prototypes. 
Information  is  provided  to  the  scheduler  via  timing  constraints  from  the  prototype’s  PSDL 
program.  A  prototype  must  be  translated  before  it  can  be  scheduled,  and  scheduled  befOTe 
it  can  be  compiled.  Upon  scheduling  a  prototype,  CAPS  provides  schedule  diagnostic 
information  which  can  be  analyzed  and  used  to  direct  timing  constraint  modifications. 

h.  The  Compiler 

CAPS  uses  the  SunAda  Ada  compiler.  The  compilation  process  is 
completely  automated  via  the  “Con5)ile”  command  provided  in  the  “Exec  Support”  pull¬ 
down  menu  in  the  CAPS  User-Interface.  Successful  prototype  compilation  requires  the 
formal  parameter  lists  of  atomic  operator  implementation  modules  to  conform  to  CAPS 
interface  conventions. 

L  The  E  volution  Control  System 

The  CAPS  Evolution  Control  System  (ECS)  [Bad93]  is  a  system  that 
supports  distributed  prototype  development  in  a  team  environment  The  ECS  makes  use 
of  a  design  database  (DDB)  for  persistent  storage  of  prototype  development  data.  The 
ECS  supports  maintenance  of  a  designer  pool  from  which  to  draw  for  prototype 
development  tasks.  Within  the  ECS,  prototype  development  is  modeled  as  a  series  of 
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steps  that  are  created  by  the  project  manager.  These  steps  are  automaticaUy  scheduled 
and  assigned  to  available  designers. 

j.  The  Merger 

The  CAPS  Merger  [Dam94]  provides  automated  prototype  change¬ 
merging.  Based  on  slicing  theory,  as  applied  to  PSDL  programs,  the  Merger  automates 

the  combination  of  two  separate  modifications  to  a  base  prototype.  The  Merger  detects 
and  warns  of  conflicts  between  the  two  changes  to  be  merged.  If  no  conflicts  occur,  or  if 
they  are  overridden,  the  Merger  creates  a  PSDL  program  for  the  newly  created  prototype 
which  incorporates  the  changes  of  each  of  the  modified  prototypes. 

k.  The  Software  Base 

The  CAPS  software  base  and  its  associated  retrieval  mechanism  [Dol93] 
provide  access  to  a  repository  of  reusable  Ada  and  PSDL  components.  The  software  base 
allows  a  designer  to  browse  as  well  as  query  its  components.  Queries  to  the  software  base 
can  be  in  the  form  of  keywords  or  PSDL  specifications.  In  the  current  release  of  CAPS, 
the  software  base  matching  mechanism  is  based  on  parameter  matching. 

E.  THE  PROTOTYPING  SYSTEM  DESIGN  LANGUAGE  (PSDL) 

PSDL  is  a  partially  graphical  specification  language  developed  for  designing  real¬ 
time  systems.  It  has  several  facilities  for  modeling  timing  and  control  constraints,  but  is 
also  useful  for  requirements  analysis  and  feasibility  studies.  It  was  designed  as  a 
prototyping  language  specifically  for  CAPS,  to  provide  the  designer  with  a  simple  way  to 
specify  software  systems  [LBY88].  PSDL  places  strong  emphasis  on  modularity, 
simplicity,  reuse,  adaptability,  abstraction,  and  requirements  tracing. 

A  PSDL  prototype  is  built  as  an  hierarchical  structure  of  components,  gr^hically 
represented  as  data  flow  diagrams,  and  augmented  with  timing  and  control  information. 
Each  component  may  contain  zero  or  more  definitions  for  OPERATORS  and  TYPES, 
where  each  definition  has  two  parts: 
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•  Specification  part.  Defines  the  external  interfaces  of  the  operator  or  the 
type  through  a  series  of  interface  declarations,  provides  timing  constraints,  and  describes 
functionality  by  using  informal  descriptions  and  axioms. 

•  Implementation  part:  Denotes  what  the  implementation  of  the  component 
is  going  to  be,  either  in  Ada  or  PSDL.  Ada  in^lementations  point  to  Ada  modules,  which 
provide  the  functionality  required  by  the  component's  specification.  PSDL 
implementations  are  data  flow  diagrams  augmented  with  a  set  of  data  stream  definitions 
and  a  set  of  control  constraints. 

1.  PSDL  Computational  Model 

PSDL  is  based  on  a  con^utational  model  containing  OPERATORS  that 
communicate  via  DATA  STREAMS,  where  each  stream  carries  values  of  a  fixed  abstract 
data  type.  There  are  several  ADTs  already  built  into  PSDL;  the  PSDL_EXCEPTION  is 
one  of  them.  Modularity  is  supported  through  the  use  of  indqiendent  operators  that  can 
only  gain  access  to  other  operators  when  they  are  connected  via  data  streams. 

The  PSDL  computational  model  is  formally  represented  as  an  augmented  graph 
[LBY88] 

G  =  (V^,T(v),C(v)) 

where: 

•  V  is  a  set  of  vertices 

•  E  is  a  set  of  edges 

•  T(v)  is  the  set  of  timing  constraints  for  each  venex  v 

•  Ov;  is  the  set  of  control  constraints  for  each  venex  v 

Each  vertex  represents  an  operator  and  each  edge  represents  a  data  stream. 

a.  Operators 

An  operator  represents  either  a  function  or  a  state  machine.  When  it  fires, 
an  operator  reads  one  data  object  fixim  each  of  its  input  data  streams  and  writes  at  most 
one  data  object  on  each  of  its  output  streams.  If  the  output  depends  only  on  the  current 
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set  of  input  values,  then  the  operator  represents  a  function.  In  other  words,  the  same 
response  is  given  each  time  they  are  triggered.  If,  on  the  other  hand,  the  output  of  the 
operator  depends  upon  the  input  values  and  on  internal  state  values  representing  some  part 
of  the  history  of  the  computation,  then  the  operator  represents  a  state  machine. 

A  PSDL  operator  can  be  either  atomic  or  composite.  Operators  that  are 
decomposed  into  lower  levels  are  called  composite  operators,  and  they  represent  networks 
of  components.  This  decon^sition  is  always  functional  An  operator  that  is  not 
decomposed  is  called  atomic,  and  in  the  current  version  of  CAPS,  they  are  implemented  in 
Ada,  but  any  language  could  be  used  for  that  purpose.  According  to  the  PSDL  grammar, 
it  is  in  the  irrplementation  part  of  the  operator  that  we  can  declare  an  operator  to  be 
atomic  or  composite. 

b.  Data  Streams 

Data  streams  represent  sequential  data  flow  mechanisms  which  move  data 
between  operators.  There  are  two  kinds  of  data  streams;  sampled  streams  and  data  flow 
streams. 

In  PSDL  the  data  trigger  of  a  consumer  operator  determines  the  type  of  a 
data  stream.  If  the  stream  is  declared  in  the  ‘TRIGGERED  BY  ALL”  clause  of  the 
consumer  operator,  then  the  stream  is  a  data  flow  stream.  In  all  other  cases  it  is  a  sampled 
stream. 

Data-flow  streams  in  the  current  implementation  are  similar  to  FIFO 
queues  with  a  length  of  one.  Any  value  placed  into  the  queue  must  be  read  by  another 
operator  before  any  other  data  value  may  be  placed  into  the  queue,  or  it  will  overflow. 
Values  read  from  the  queue  are  removed  from  the  queue,  and  if  any  attempt  is  made  to 
read  from  an  empty  queue,  it  wiU  underflow.  Sampled  data  streams  may  be  considered  as 
a  programming  variable  which  may  be  written  to  or  read  from  at  any  time  and  as  often  as 
desired.  A  value  is  on  the  stream  until  it  is  replaced  by  another  value.  Some  values  may 
never  be  read,  because  they  are  replaced  before  the  stream  is  sampled.  As  can  be  seen, 
care  must  be  taken  when  reading  values  from  uninitialized  sampled  streams.  All  PSDL 
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data  streams  contain,  at  most,  one  data  item  at  any  given  time.  In  summary,  it  could  be 
said  that  a  data  flow  stream  guarantees  that  none  of  the  data  values  are  lost  or  replicated, 
while  a  sampled  stream  does  not  make  such  a  guarantee. 

c.  State  Streams 

A  CAPS  prototype  is  a  well-formed  PSDL  program  if  its  graph 
representation  (excluding  all  state  streams)  is  a  directed  acyclic  graph  (DAG).  This 
restriction  may  not  seem  to  make  sense  at  first  glance.  However,  when  a  prototype  graph 
contains  a  cycle,  this  indicates  the  presence  of  state  information,  and  states  must  be 
explicitly  declared  and  initialized.  PSDL  fully  supports  the  integration  of  states  in  its 
prototypes. 

When  a  state  is  introduced  into  an  atomic  operator,  it  must  be  implemented 
within  the  Ada  code  for  that  operator,  and  shouldn't  appear  in  the  graph  as  a  self  loop 
state  edge. 

d.  Types 

PSDL  user-defined  data  types  are  abstract  data  types  (ADTs)  which  can  be 
used  in  CAPS  prototypes.  PSDL  types,  like  PSDL  operators,  can  be  implemented  in 
either  PSDL  or  Ada.  Types  can  be  associated  with  a  set  of  operators.  Types  implemented 
in  Ada  are  realized  by  an  Ada  package  that  defines  a  private  type  and  a  subprogram  for 
each  operator  on  that  type. 

e.  Exceptions 

Exceptions  in  PSDL  are  values  that  can  be  transmitted  on  data  streams  of 
the  type  “PSDL_EXCEPTION”.  During  prototype  execution,  undeclared  exceptions  are 
transformed  into  PSDL  cxccpdons  of  the  type  PSDL.EXCEPTION,  which  is  a  subtype  of 
UNDECLARED_ADA_EXCEPT10N.  Exceptions  can  also  be  raised  by  explicitly 
declaring  them  in  the  control  constraints  part  of  the  PSDL  program  for  the  prototype. 
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/.  Timers 

PSDL  timers  arc  software  stopwatches  that  arc  used  to  record  the  length  of 
time  between  events,  or  to  control  the  duration  the  system  spends  in  some  particular  state. 
They  are  declared  in  the  inplementation  part  of  a  root  operator,  and  are  governed  by  the 
control  constraints  “START  TIMER”,  “STOP  TIMER”  and  “RESET  TIMER”. 

2.  Control  Abstractions 

As  a  major  property  of  real-time  systems,  periodic  execution,  as  well  as  other 
timing  related  attributes,  is  supported  eiqilicidy.  The  order  of  execution  is  only  partially 
specified,  and  is  determined  fiom  the  data  flow  relations  given  in  the  enhanced  data  flow 
diagrams,  but  also  affected  by  die  types  of  data  triggers  among  operators. 

There  are  several  control  aspects  to  be  specified,  such  as  whether  the  operator  is 
periodic  or  sporadic,  the  triggering  conditions,  and  the  output  guards. 

a.  Periodic  and  Sporadic  Operators 

PSDL  supports  both  periodic  and  sporadic  operators.  Periodic  operators 
are  triggered  by  the  scheduler  at  tqiproximately  regular  time  intervals,  so  that  they  start 
execution  somewhere  after  the  beginning  of  the  period,  and  complete  by  some  deadline, 
which  defaults  to  the  end  of  the  period.  Sporadic  operators  arc  triggered  by  the  arrival  of 
new  data,  and  possibly  at  irregular  time  intervals. 

b.  Data  Triggers 

Any  PSDL  operator  can  have  a  data  trigger,  of  which  there  arc  two  kinds, 
as  illustrated  by  the  following  examples: 

OPERATOR  P  TRIGGERED  BY  ALL  X,  Y,  Z 

OPERATOR  Q  TRIGGERED  BY  SOME  A,  B 

In  the  first  exanqile,  the  operator  P  is  ready  to  fire  whenever  new  data 
values  have  arrived  on  all  three  streams  X,  Y  and  Z  (triggering  set),  although  there  may  be 
other  streams  coming  into  the  operator  P,  in  which  case  the  data  values  do  not  need  to  be 
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new.  This  mesns  that  the  data  streams  associated  with  X,  Y  and  Z  are  data  flow  streams. 
This  kind  of  trigger  should  be  used  when  the  items  in  a  stream  represent  discrete  events 
(e.g.,  transactions  on  a  bank  account)  rather  than  samples  fiom  a  continuous  source  of 
data  (e.g.,  a  temperature  sensor).  This  kind  of  trigger  also  ensures  that  the  output  of  the 
operator  is  always  based  on  fresh  data  for  all  of  the  inputs  in  the  triggering  set 

Hie  most  important  design  consideration  when  “BY  ALL”  triggers  are 
used  is  management  of  the  firing  frequencies  of  the  producing  and  consuming  operators. 
The  period  of  the  consuming  operator  must  be  smaller  or  equal  to  the  period  of  the 
producing  operator,  or  stream  buffCT  overflow  errors  will  result  (i.e.,  the  consuming 
operator  must  fire  at  least  as  often  as  the  producing  operator).  This  is  because  the 
streams  in  CAPS  can  hold  a  maximum  of  one  data  item.  CAPS  ensures  that  if  the 
consuming  operator’s  period  is  less  than  that  of  the  producing  operator,  the  actual  firing 
rate  of  the  two  will  be  the  same  (i.e.,  “BY  ALL”  trigger  data  streams  are  tested  for  new 
information  prior  to  the  actual  firing  of  the  consuming  operator). 

In  the  second  example,  the  operator  Q  is  ready  to  fire  whenever  new  data 
arrives  on  at  least  one  of  the  inputs  A  or  B.  This  kind  of  activation  condition  guarantees 
that  the  output  of  operator  Q  is  based  on  the  most  recent  data  fiom  at  least  one  of  its 
critical  inputs  A  and  B,  mentioned  after  the  TRIGGERED  BY  SOME  clause.  This  is  also 
a  very  constrained  condition,  since  the  scheduler  must  guarantee  that  a  new  data  in  A  or  B 
wiU  not  be  lost 

If  a  periodic  operator  has  a  data  trigger,  the  operator  is  conditionally 
executed  with  the  data  trigger  serving  as  input  guard. 

If  a  data  trigger  is  not  satisfied,  the  values  are  not  read  and,  consequently, 
not  consumed  fiom  any  of  the  input  streams. 

c.  Execution  Guards 

The  firing  of  a  PSDL  operator  can  be  regulated  by  an  execution  guard. 
Executimi  guards  are  conditional  statements  which  are  evaluated  prior  to  firing  the 
associated  operator.  Execution  guards  can  depend  on  data  from  any  incoming  data  stream 
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and  they  can  be  combined  with  the  “BY  ALL”  and  “BY  SOME”  data  triggers  mentioned 
above.  Even  if  an  execution  guard  is  not  satisfied,  the  values  are  read  and  consumed  from 
all  the  input  streams,  without  firing  the  operator.  Examples  are: 

OPERATOR  R  TRIGGERED  BY  SOME  X,  Y  IF  X  >  20.0 

OPERATOR  S  TRIGGERED  IF  X:  EXCEPTION 

d.  Conditional  Output 

PSDL  conditional  output  is  mq)lemented  in  CAPS  as  guarded  execution  of 
code  that  writes  values  to  data  streams.  Conditional  output  does  not  affect  the  firing  of  an 
operator,  which  will  fire  in  accordance  with  the  CAPS  schedule  regardless  of  whether  or 
not  its  output  is  written  to  an  output  data  stream.  The  condition  of  an  output  guard  may 
depend  on  the  output  values  of  the  operator,  on  the  values  read  from  the  input  streams, 
and  on  the  values  of  timers. 

3.  Timing  Constraints 

Operators  can  be  time-aitical  or  non  time-critical,  depending  on  whether  or  not 
they  are  assigned  a  value  for  the  maximum  execution  time  (MET)  by  the  designer.  If 
time-critical,  they  can  be  further  subdivided  into  periodic  or  sporadic  operators.  Periodic 
operators  are  e^licidy  assigned  a  fiequency  (PERIOD)  of  execution,  meaning  that  they 
wfll  fire  within  regular  periods,  exactly  once,  but  not  necessarily  at  regular  intervals  of 
time.  Sporadic  operators  are  not  e}q)licitly  assigned  a  period,  but  they  fire  whenever  there 
is  new  data  on  a  set  of  input  data  streams,  having,  however,  a  minimum  interval  of  Hma 
between  successive  firings.  Periodic  operators  can  also  be  triggered  by  the  arrival  of  data. 
However,  this  trigger  wfll  behave  like  a  condition  to  be  checked  during  periodic  firing. 
Every  sporadic  operator  has  an  MRT  and  MCP  in  addition  to  an  MET. 

Timing  constraints  are  an  essential  part  of  specifying  real-time  systems,  and  in 
PSDL  the  following  timing  constraints  are  supported: 

•  Maximum  Execution  Time  (MET) 

•  Period  (PER) 
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•  Finish  Within  (FW) 

•  Maximum  Response  Time  (MRT) 

•  Minimum  Calling  Period  (MCP) 

•  Latency  (LAT) 

•  Minimum  Output  Period  (MOP) 

The  MET  reflects  the  amount  of  CPU  time  that  an  operator  may  use  for  execution, 
and  is  t^plicable  to  both  periodic  and  sporadic  operators.  Note  that  for  atomic  operators 
the  MET  complies  with  the  above  definition.  For  the  composite  operator,  however,  the 
MET  is  the  maximum  CPU  time  needed  along  any  thread  of  control.  Within  CAPS,  the 
MET  is  assumed  to  account  for  the  following:  data  triggering  checks,  stream  reads, 
execution  guards  checks,  the  execution  itself,  output  guards  checks,  stream  writes,  and 
exception  handling. 

This  parameter  is  by  itself  one  of  the  most  difficult  to  quantify.  It  is,  therefore, 
unfortunate  that  it  is  also  one  of  the  most  important  parameters  employed  during  the 
scheduling  process.  Two  alternatives  can  be  taken:  to  use  the  worst-case  execution  times, 
which  can  result  in  a  poor  processor  utilization,  or  to  use  some  value  smaller  than  the 
worst-case,  which  introduces  the  possibility  of  an  overload.  For  reasons  of  safety,  CAPS 
uses  the  first  approach  by  defining  the  MET  as  an  upper-bound  on  the  execution  time. 
For  further  reading  about  execution  time  issues  refer  to  Leinbaugh  [LciSO,  LY82]  and 
Mok  [Mok83]. 

Acmally,  due  to  the  critical  nature  of  the  systems  that  CAPS  was  intended  to 
prototype,  the  worst-case  approach  has  been  used  throughout  its  design.  This  approach  is 
observable  even  in  the  scheduling  model,  where  the  non-preemption  option  was  chosen. 
This  is  because,  while  it  is  true  that  if  a  non-preemptive  schedule  can  be  devised  for  a  set 
of  tasks,  then,  it  is  possible  to  devise  a  preemptive  one,  but  the  opposite  is  not  always  true 
[Bla76). 
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The  MRT  defines  an  upper-bound  on  the  time  between  the  arrival  of  new  data  that 
satisfies  all  data  triggering  conditions  of  a  sporadic  operator  and  the  time  when  the  last 
value  is  written  onto  the  output  stream.  The  MRT  applies  only  to  sporadic  operators. 

The  MCP  also  applies  only  to  sporadic  operators,  and  represents  a  lower-bound  on 
the  timp.  between  two  consecutive  triggaings  of  a  sporadic  operator.  It  constrains  the 
behavior  of  the  producers  of  the  triggering  data  values,  rather  than  constraining  the 
behavior  of  the  operator  itself.  Both  timing  constraints  are  illustrated  in  Figure  2.5. 

As  shall  be  seen  later,  each  sporadic  operator  is  going  to  be  converted  into  an 
equivalent  periodic  one,  whose  period  is  called  the  triggering  period  (TP). 

Scheduling  delay  for  a  sporadic  operator  is  the  interval  of  time  between  the  writing 
into  an  output  data  stream  by  the  producer  and  the  corresponding  reading  of  the  input 
values  by  the  consumer. 


Figure  2.5.  Sporadic  Timing  Constraints 


Periodic  operators  are  triggered  by  temporal  events  which  must  occur  at  regular 
intervals.  For  each  operator,  these  activation  times  are  deteraiined  by  the  specified  period 
(PER),  which  is  the  time  interval  between  two  successive  activations.  The  penod  applies 
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only  to  periodic  operators.  Note,  however,  that  there  is  a  distinction  between  activation 
time  and  the  actual  start  time  of  a  periodic  operator  as  shown  in  Figure  2.6. 


Finish  within  (FW)  defines  an  upper  bound  on  the  finish  time  for  a  periodic 
operator.  The  difference  between  the  activation  time  and  its  deadline  is  called  the 
scheduling  interval  (SI)  and  it  is  equal  to  FW. 

Scheduling  intervals  of  a  periodic  operator  can  be  viewed  as  fixed  windows  of  a 
size  equal  to  FW,  evenly  separated  by  the  period  PER,  and  whose  absolute  position  on  the 
time  axis  is  determined  by  the  stan  time  t  of  its  first  execution.  For  the  first  instance  this 
time  may  vary  within  the  closed  interval  [04>ER]  of  the  operator,  and  is  called  the  phase 
of  the  operator  (Figure  2.7).  Scheduling  intervals  for  sporadic  operators  will  be  covered 
in  the  next  chapter,  after  we  discuss  how  to  deal  with  this  type  of  operator. 
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Figure  2.7.  The  Scheduling  Interval 

The  difference  between  FW  and  MET  is  called  the  slack  of  the  operator.  Table 
2.1  summarizes  the  timing  constraints  for  periodic  and  sporadic  operators. 


Table  2.1.  Main  PSDL  Timing  Constraints 

To  express  the  behavior  of  distributed  systems,  PSDL  provides  two  timing 
constraints.  Latency  (LAT)  and  the  Minimum  Output  Period  (MOP).  The  latency  of  a 
stream  is  an  upper-bound  on  the  duration  of  the  time  interval  between  the  instant  a  data 
value  is  written  into  a  stream  and  the  instant  that  data  value  becomes  available  for  reading 
from  the  stream.  In  other  words,  the  latency  attribute  for  a  stream  is  meant  to  specify  an 
upper-bound  on  the  allowable  time  spent  by  that  stream  in  the  network.  This  information 
should  be  used  by  the  scheduler  to  simulate  the  worst  case  behavior  for  the  delay  in  the 
network.  Note,  however,  that  this  attribute  does  not  expUcitly  require  that  the  data 


carried  by  the  stream  should  be  consumed,  within  the  time  interval,  by  the  consumer 
operator  on  the  other  side  of  the  network.  The  notation  LAT,^,  will  be  used  to  denote  the 
latency  associated  with  the  stream  between  operators  Tjt  and  Ty. 

The  minimiun  output  period  is  a  lower-bound  on  the  duration  of  the  interval 
between  two  successive  write  events  on  the  stream.  In  the  absence  of  explicit 
sjmchronization,  both  the  latency  and  minimum  output  period  of  a  stream  have  the  default 
value  of  zero  (no  delay,  imbounded  data  rate).  The  purpose  of  these  additional  constraints 
is  to  declare  communication  constraints  that  arise  from  hardware  lirrritations  imposed  by 
external  constraints  on  how  the  software  functions  must  be  allocated  to  different  physical 
nodes  of  a  distributed  systera  Explicit  modeling  of  these  constraints  is  also  sometimes 
required  to  ensure  feasibility,  because  latency  affects  calculations  of  time  budgets,  as  well 
as  maximum  execution  times  for  composite  operators.  The  effect  of  these  constraints  on 
static  scheduling  is  that  data  cannot  be  read  from  a  stream  until  a  delay  equal  to  the 
latency  has  elapsed,  and  that  data  cannot  be  written  into  a  stream  until  the  minimum 
period  has  elapsed. 

4.  A  PSDL  Prototype  Example 

Figure  2.8  shows  a  simple  autopilot  system  that  illustrates  some  of  the  typical 
features  of  PSDL.  The  example  has  a  minimal  specification  pan  with  an  informal 
description.  The  implementation  pan  contains  a  graph,  making  the  operator  Autopilot  a 
“composite”  operator.  The  figure  also  indicates  maximum  execution  times,  170  ms  for 
operator  display,  50  ms  for  operators  compass  and  altimeter,  and  75  ms  for  the  remaining 
operators.  All  operators  are  periodic  with  a  period  of  500  ms,  except  for  the  operator 
controLsurfaccs,  which  is  sporadic,  with  an  MRT  and  MCP  of  900  ms,  as  it  is  shown  in 
the  control  constraints  pan  of  the  PSDL  program. 

Concluding,  it  can  be  said  that  the  operator  controLsurfaccs  vrill  be  triggered 
whenever  there  is  new  data  in  either  the  course_command  or  the  altitudc_command 
streams.  The  operators  correct_altitudc  and  correa_course  will  be  triggered  whenever 
there  is  new  data  in  the  actual_altitudc  and  actual_course  streams,  respectively. 
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OPERATOR  nnopilot 
SPECmCATION 

CTATES  ddu.oouac:  INTEGER  INTIIAIXY  0 

STATES  dclii^iliiiudc:  INTEGER  INTITALLY  0 

CTATES  dcfired.oouxK:  INTEGER  INITIALLY  0 

STATES  fkciied.altitiide:  INTEGER  INITIALLY  0 

END 

IMPLEMENTATION 

GRAra 


Hevator^sUtus 


DATA  STREAM 

•cQul.Altinide:  INTEGER. 

•caul^cxuBe:  INTEGER, 

lIlimdc_COmfn«nd: 
onuiir^cnrwTiind :  ami 

devaior.fU&is:  ctevaior.iutm.type, 

xudda.Batui:  nidder.niiia  type 

CX>NTROL  CONSTRAINTS 
OPERATOR 
PERIOD  SOO  MS 
OPERATOR  ocrapui 
PERIOD  500  MS 

OPERATOR  oaotnl^fudeoee  TRIGGERED  BY  SOME  oouae  ccnnnani,  aloiude 
MAXIMUM  RESPONSE  TIME  900  MS 
MDGMUM  CALLING  PERIOD  900  MS 
OPERATOR  oarraa  thiiude  TRIGGERED  BY  ALL  ac&ul 
PERIOD  500  MS 

OPERATOR  oami  eouae  TRIGGERED  BY  ALL  aGDiALcoiiae 
PERIOD  500  MS 
OPERATOR  dis^y 
PERIOD  500  MS 

ETNfP  _ 

Hgure  2.8.  Prototype  of  an  Autopilot 
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m.  FUNDAMENTAL  ISSUES  IN  REAL-TIME  SCHEDULING 


A.  THE  SCHEDULING  MODEL  AND  SOME  DEFINITIONS 

An  instance  of  a  prototype  T  can  be  thought  of  as  the  union  of  three  disjoint  finite 
sets,  namely  the  set  P  of  periodic  operators,  the  set  S  of  sporadic  operators  and  the  set  N 
of  non-time  critical  operators.  Within  CAPS,  each  periodic  operator  can  be  described,  for 
scheduling  purposes,  as  a  three-tuple  (MET* ,  PERx ,  FWx),  where  METx  is  the  maximum 
execution  time  used  by  each  instance  of  operator  X,  PER*  is  its  period  and  FWx  is  the 
length  of  its  scheduling  interval.  Likewise,  each  sporadic  operator  can  be  described  as  a 
three-tuple  (METx ,  MCPx ,  MRTx  )**’,  where  MCPx  is  the  minimum  period  between  two 
consecutive  instances  of  operator  X,  and  MRTx  is  the  upper  bound  on  the  time  between 
the  triggering  of  operator  X  by  some  new  data  arrival,  and  the  completion  of  writing  to  all 
of  its  output  streams.  The  superscript  SP  is  used  in  the  sporadic  case,  only  to  distinguish 
from  the  three-tuple  of  the  periodic  operator.  Given  any  static  schedule  for  a  prototype  T, 
we  shall  use  Su,  fu  and  du  to  denote  the  actual  starting  time,  completion  time  and  deadline 
of  the  i*  instance  of  operator  X  in  the  schedule.  In  any  feasible  schedule,  we  must  have 

0  <  Six  <  PERx 
and 

dj,  =  Six  +  (i -l)x  PERx  +  FWx  Eq.  (1) 

for  every  periodic  operator  X,  where  Su  is  called  the  phase  of  operator  X  as  defined  in 
Chapter  H.  Note  also  from  Eq.  1  that  the  deadline  for  first  instance  of  any  operator  is 
calculated  relative  to  its  start  lime  rather  than  from  time  zero*.  This  condition  will  release 
the  scheduler  ftom  enforcing  the  condition  that  the  first  instance  of  operator  X  should 
finish  the  time  PER*.  Whenever  possible,  it  is  going  to  be  used  the  letters  X  and  Y  to 
denote  operators,  leaving  the  letters  i  and  j  to  denote  their  corresponding  instances. 


*Time  zero  is  defined  as  the  time  when  prototype  starts  execution.  In  reality  it  is  the  start  time  of  the 
first  operator  according  to  the  topological  son. 
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Since,  in  general,  the  release  time  does  not  affect  the  complexity  of  the  scheduling 
problem  [Mok83],  it  will  be  assumed  that  aU  first  instances  are  released  at  time  zero,  but 
may  be  constrained  by  the  precedence  relationship  between  the  operators,  if  one  exists. 

By  definition,  every  periodic  operator  must  start  and  finish  execution  within  its 
period  of  activation. 

The  following  restriction  is  also  imposed  on  the  model,  where  the  Tnaximnm 
execution  time  must  be  smaller  or  equal  to  the  finish-within,  which  in  turn  must  be  smaller 
or  equal  to  the  period; 

MET^FW<PER 

Qearly,  the  first  inequality  is  needed,  otherwise  there  is  no  way  to  execute  such  an 
operator  within  the  specified  amount  of  time  (FW). 

One  may  want  to  argue  that  there  is  a  need  to  relax  the  second  inequality  to  PER  < 
MET  <  FW.  Since  PER  <  MET,  such  processor  demand  can  only  be  satisfied  using 
pipelining  in  a  multiprocessor  environment  [Luq93,  LSB93],  which  will  be  discussed  in  the 
next  section. 

Note  that  for  the  sporadic  operator  all  of  the  above  assumptions  are  also 
applicable,  since  they  will  be  converted  into  equivalent  periodic  operators,  as  can  be  seen 
later  in  this  chapter. 

The  Harmonic  Block  (HB)  of  a  periodic  task  set  P  is  the  least  common  multiple 
(LQVl)  of  all  the  periods  in  P.  It  is  the  interval  upon  which  the  task  set  vidll  be  tested  for 
schedulability.  If  a  feasible  schedule  can  be  found  within  2xHB,  in  the  case  where 
latencies  are  not  allowed  in  the  schedule,  or  in  at  most  3xLCM  if  latencies  are  allowed, 
then  it  is  possible  to  say  that  the  same  pattern  can  be  repeated  forever.  This  topic  will  be 
further  discussed  in  Section  C 

A  prototype  T  is  said  to  be  schedulable  if  there  exists  a  schedule  such  that  the 
completion  time  for  the  execution  of  instance  i  of  operator  X  (fix)  is  less  than  or  equal  to 
its  corresponding  deadline  du,  for  all  i  and  X,  and  the  precedence  constraints  of  the 
prototype  T  are  satisfied. 
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The  precedence  constraint  between  operators  X  and  Y,  written  as  X  <  Y,  where  < 
denotes  a  partial  ordering  on  the  execution  of  tasks  X  and  Y,  is  satisfied  if 

V  instances  ij  (i-1)  x  PER*  +  Su  <  (j-l)  PERy  +  Siy 

and 

(j-l)xPERy  +Siy  +  A  <  i  X  PERx  +  Si* 

where  (i-1)  x  PER,  =  (j"l)  PERy  *  and  A  equal  the  maximum  time  to  read  input 
operator  Y. 

Operators  from  either  the  periodic  set  P  or  fix>m  the  sporadic  set  S  are  non- 
preemptable,  which  means  that  once  they  start  execution  they  will  run  to  completion.  The 
only  operators  that  can  be  preempted  are  those  belonging  to  the  set  N. 

No  idle  time  is  inserted  into  the  static  schedule,  unless  there  are  no  operators  ready 
to  execute. 

AH  riming  information  is  assumed  to  be  an  integral  multiple  of  a  basic  unit  of  time, 
which  within  CAPS  is  assumed  to  be  the  millisecond.  Table  3.1  presents  a  summary  of  the 
major  assumptions  of  the  scheduling  model. 

_ For  all  periodic  operators  MET  ^  FW  <  PER _ 

All  rime-critical  operators  are  non-preemptable 

_ Time  is  discrete _ 

A  periodic  operator  is  completely  specified  by  the  tuple 

_ (MET.  PER,  FW) _ 

A  sporadic  operator  is  completely  specified  by  the  tuple 

_ (MET.MCP.MRT)" _ 

_ Static  Scheduling  is  assumed _ 

Table  3.1.  Summaiy  of  our  Scheduling  Model 

In  the  next  section,  a  series  of  theorems  on  schedulability  for  a  set  of  independent 
non-preemptive  periodic  task  sets  will  be  presented.  They  will  provide  the  necessary 
background  to  build  a  firamewoik  upon  which  the  later  sections  of  this  chapter  will  be 
based. 


^  This  condition  will  be  relaxed  after  we  present  our  new  synchronization  model  in  Chapter  IV. 
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B.  CONDITIONS  FOR  SCHEDUL ABILITY  OF  NON-PREEMPTIVE  TASKS 

In  this  section,  a  series  of  schedulability  checks  are  introduced  for  a  periodic  task 
set  P  that  has  no  precedence  constraints.  These  results  will  be  also  applied  to  a  set  of 
periodic  tasks  with  precedence  constraints  in  Section  D  of  this  chapter. 

1.  The  Maximum  Execution  Time  Theorem 

When  dealing  with  non-preemptive  uniprocessor  static  scheduling  a  gnfRrient 
condition  for  unfeasibility  occurs  whenever  a  task  requires  more  computation  timp.  than 
the  period  of  any  other  task,  or  more  specifically,  more  than  the  minimiiTn  period  among 
all  tasks.  Formally: 

Theorem  1: 

‘Tor  an  independent  periodic  task  set  P,  if  3  some  tasks  X  and  Y  e  P,  such  that 
MET,  >PERy  then  P  is  not  schedulable  in  the  uniprocessor  case  by  any  non-preemptive 
algorithm.  Furthermore,  if  X  =  Y  then  neither  the  preemptive  nor  the  non-preemptive 
algorithms  can  find  a  feasible  schedule.” 

Proof: 

Qearly,  whenever  task  X  executes,  task  Y,  which  happens  to  have  a  smaller 
period,  will  be  blocked  for  an  interval  of  time  bigger  than  its  period,  which  is  contradictory 
with  the  definition  of  a  periodic  ta^k  q 

Note  that  the  Theorem  still  holds  if  precedence  relationship  exists  among  the  ta^ks 
in  P .  This  same  result  is  also  valid  for  a  sporadic  task  set  when  MET,  >  MQ’y  for  X  =  Y 
(trivial  case).  However,  for  X  Y  the  situation  is  slightly  more  complex,  and  there  are 
two  cases  to  consider.  The  first  is  when  MRTy<  MCPy,  and  it  is  clearly  not  schedulable. 
The  second  case  is  when  MRTy  t  MCPy,  and  the  set  is  not  schedulable  if  MET,  +  METy  > 
MRTy,  as  shown  in  Figure  3.1. 
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Corollary:  (for  the  distributed  case) 

‘Tor  an  indq)endent  periodic  task  set  P,  if  3  some  tasks  X  and  Y  e  P,  such  that 
METx  SPERy,  then  in  order  for  P  to  be  schedulable  in  the  multiprocessor  case,  tasks  X 
and  Y  must  be  placed  in  different  processors,  and  if  X  =  Y,  then  it  must  be  pipelined.  □ 

The  conditions  imposed  on  a  task  X  for  it  to  be  pipelineable  as  well  as  a  detailed 
description  of  pipelining  in  this  context,  can  be  found  in  the  work  of  Luqi  [Luq93]  and 
Luqi,  Shing  and  Brockett  [LSB93]. 

There  are  two  ways  to  handle  pipelining.  The  first  is  to  use  task  migration  at  run¬ 
time,  which  involves  sending  a  copy  of  the  code  and  data  to  be  executed  in  the  other 

pnxxssor.  This  presents  the  following  problems: 

1)  It  increases  the  context  swtehing  overhead,  with  direct  impact  on  the  timing 

constraints 

2)  There  is  a  need  to  create  an  additional  task  to  handle  the  dispatching  of  tasks 

3)  It  is  not  well  suited  for  static  scheduling 


The  second  approach  is  to  replace  the  tasks  to  be  pipelined  in  the  other  processors 
in  a  pre-processing  step.  For  exan^le,  consider  a  periodic  operator  OPa(150,1(X),150) 
with  inputs  Dl,  D2  and  output  D3  as  shown  in  Figure  3.2.  As  shown  in  Figure  3.2b,  we 
can  replace  operator  OPa  with  two  identical  operators,  OPb(150,200,150)  and 
OPc( 150,200, 150),  with  twice  the  original  period  and  a  state  stream  syn,  whose  latency 
equals  the  time  taken  by  the  non-overlappable  segment  of  the  code  implementing  operator 
OPa.  The  operators  OPb  and  OPc  will  be  triggered  alternately  on  the  value  of  syn. 


The  replicadon  of  tasks  throughout  the  system  presents  the  following  problems: 

1)  It  increases  the  memory  requirements  for  the  processors 

2)  It  demands  highly  sophisticated  mechanisms  for  implementing  tight 
synchronized  schedules  among  the  processors,  which  restricts  this  approach  to 
the  shared  memory  models  with  a  global  clock 

Both  of  the  above  discussed  methods,  however,  suffer  from  the  very  serious 
liroblem  of  having  to  quantify  the  timing  parameters  of  the  segments  of  code  that  cannot 
be  overlapped,  which  is  by  itself  one  of  the  hardest  ones.  If  those  timing  parameters  could 
be  known  in  advance,  then  the  operator  could  be  separated  into  independent  parts,  and 
pipelining  would  not  be  needed. 
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The  validity  of  pipelining  in  a  hard  real-time  environment  is  therefore  questionable, 
and,  furthermore,  it  is  impossible  to  implement  in  a  distributed  system  where  there  is  no 
inexpensive  method  by  which  to  assure  tight  synchronization  among  tasks. 

2.  The  Finish- Within  Theorem 

Theorem  2: 

*Tor  an  independent  periodic  task  set  P  if  B  some  indivisible  task  X  e  P  such  that 
MET*  >  FWx  then  P  is  not  schedulable  under  any  scheduling  algorithm,  not  even  in  a 
multiprocessor  environmenL” 

Proof: 

Clearly,  if  MET*  >  FW*,  the  only  way  to  handle  this  case  is  if  we  could  split  task  X 
into  two  or  more  data  independent  partitions,  so  that  they  could  run  in  parallel  on  different 
processors,  but,  as  stated  in  the  theorem,  X  is  indivisible.  □ 

Note  that  this  theorem  can  be  easily  extended  to  cover  the  sporadic  case  when 
MET*  >  MRT*.  It  is  also  applicable  to  the  case  where  we  have  precedence  constraints  in 
the  set  P. 

3.  The  Minimum  Period  Theorems 

In  the  other  extreme  of  Theorem  1,  there  is  a  sufficient  but  not  necessary  condition 
to  guarantee  schedulability  of  an  independent  periodic  task  set,  as  stated  in  Theorem  3: 

Theorem  3: 

‘Tor  a  periodic  task  set  P,  if  V  tasks  X  €  P,  FW*  ^  PER*  and  X  MET*  <  PER* 

x=l 

where  PER,  denotes  the  minimum  period  in  P,  then  P  is  schedulable.”  ’ 

Proof: 

The  minimum  period  is  certainly  a  divisor  of  the  least  common  multiple  of  the 
periods  (LCM),  and,  as  such,  it  can  span  the  entire  LCM  within  an  integral  number  of 


section, 


'similar  result  was  achieved  independently  by  Zhu.  et  al.  [ZLC94]  using 

in 
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steps.  It  is  a  kind  of  sliding  bin-packing  where  a  sliding  window  of  size  equal  to  the 
minimum  period  is  present  and,  always  large  enough  to  fit  all  tasks  present  in  that  window. 
Of  course,  depending  on  the  periods,  all  instances  may  not  be  active  simultaneously  in  that 
specific  window.  However,  in  the  event  that  it  does  happen,  the  instances  will  always  fit 
in  there.  □ 

As  shall  be  seen  lata*,  diis  theorem  is  valid  even  when  precedence  constraints  are 
taken  into  consideration. 
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Hgure  3.3.  The  Minimum  Period  Sliding  Window 


It  is  possible  to  use  a  counter  example  to  show  that  the  above  condition  is  a 
sufficient  but  not  necessary  condition.  Consider  two  periodic  tasks  with  the  following 
timing  constraints:  (5,10,10)  and  (2.5,5,5).  The  sum  of  METs  is  bigger  than  the  minimum 
period,  but  this  task  set  is  still  schedulable. 

What  happens  if  all  deadlines  are  restricted  to  be  less  than  or  equal  to  their 
corresponding  periods?  In  this  case  it  could  be  said  that  Theorem  3  is  not  ^plicable,  as 
illustrated  by  the  following  example:  (3,5,3),  (1,10,3). 
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Theorem  4: 

“For  a  periodic  task  set  P,  if  V  tasks  X  €  P,  X  MET*  ^  FW*,  where  FWz  denotes 

X=1 

the  minimum  FW  in  P,  then  P  is  schedulable.” 

Proof: 

The  same  idea  of  sliding  bin-packing  tq)plies  here.  Now,  however,  the  size  of  the 
bin  must  be  decreased.  In  other  words,  the  *‘bin”  now  should  be  understood  to  be  the 
least  value  among  all  periods  and  FW,  among  the  tasks  from  P.  □ 

The  next  theorem  to  be  presented  is  the  Load  Factor  Theorem,  which  is  very  well 
known  in  the  field  of  scheduling.  It  defines  a  necessary  condition  for  the  schedulability  of 
a  periodic  task  set,  and  it  basically  stipulates  that  if  the  summation  of  all  individual  load 
factors  (MET,/PER0  is  bigger  than  the  number  of  available  processors,  then  the  set  is  not 
schedulable  [LL73]. 

4.  The  Load  Factor  Theorem 

Theorem  5: 

n  MFT 

‘Tor  a  periodic  task  set  P,  if  X  •  *  >  k,  where  k  is  the  number  of  available 

x=l  PERx 

processors,  then  the  set  is  not  schedulable.” 

Proof: 

A  very  simple  proof  is  given  independently  by  Zhu  [ZLC94]  and  Jeffay  [JSM91] 
for  the  case  where  k  equals  1.  Baacally,  if  both  sides  of  the  inequality  are  multiplied  by 
the  least  common  multiple  (LCM)  of  their  periods,  it  does  not  affect  the  inequality,  but 
now 

n  LCM 

X  METxX-^^>LCM  Eq.(2) 

x=l  i^cKx 

Qearly,  the  ratio  LCM/PER*  defines  an  integer  that  represents  the  number  of 
instances  for  each  task  X  within  the  LCM.  If  the  number  of  instances  of  each  task  is 
multiplied  by  its  maximum  execution  time  and  the  results  are  then  added,  the  result  is  the 


total  computation  time  needed  by  the  entire  task  set.  According  to  Eq.  2,  however,  the 
total  computation  time  needed  is  bigger  than  the  LCM.  In  other  words,  even  if  all 
instances  are  executed  one  after  another,  they  would  not  be  able  to  finish  within  LCM. 
The  case  for  k  greater  than  one  follows  automatically.  □ 

It  should  also  be  clear  fix)m  the  proof  of  Theorem  5  that  it  is  valid  to  both 
preen:q)tive  and  non-preemptive  algorithms  [ZLC94]. 

5.  The  Task  Demand  Theorem 


The  following  theorem  is  based  upon  the  previous  work  of  Jeffay,  et  al.  [JSM91] 
which  established  necessary  and  sufficient  conditions  for  schedulability  of  an  independent 
periodic  task  set  in  a  non-preemptable  uniprocessor  environment  The  theorem  to  be 
introduced  next  is  an  adaptation  for  the  scheduling  model  used  in  this  dissertation.  It 
differs  fix)m  the  original  theorem  in  that  Jeffay’s  model  accounts  for,  tasks  that  are 
independent,  there  was  no  explicit  deadline  for  the  tasks  other  than  their  own  period,  and 
his  definition  for  a  schedulable  set  of  tasks  required  that  both  conditions  in  the  theorem 
should  be  valid  for  every  concrete  task  set  generated  from  P,  where  a  concrete  task  set  can 
be  viewed  as  the  original  independent  periodic  task  set  P  with  si)ecific  release  times  for  the 
first  instance  of  every  operator  in  P. 

TTie  inclusion  of  the  deadline  which  differs  from  the  corresponding  period  into  the 
problem  made  it  a  lot  more  complex,  since  tasks  can  now  finish  as  early  as  their  MET. 
The  new  results  are  presented  in  the  following  theorems: 

Theorem  6: 

*Tor  an  independent  periodic  task  set  P,  where  the  tasks  are  sorted  in  non- 
decreasing  order  by  finish-within  (i.e.,  for  any  pair  of  tasks  X  and  Y,  if  X  <  Y.  then  FW,  ^ 
FWy),  if  there  exists  a  feasible  schedule  for  every  concrete  task  set  in  P,  then  the  following 
condidons  hold;  ** 


f  METx 
xti  PERx 


^1. 
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2) 


Vx,l<x<n;  Vk,0^k<^^; 


2N(y, k X PERx  +FWx)  x METy  < k x PER^  +  FW^ 
y=l 

3)Vx,l<x<n;  VL,FWi<L<FWx; 

L  ^  MET,  +  N(y,  L-l)xMETy 

y=l 


where 


N(y.L) 


L 

PERy 

L 

- 1 

>* 

_ 1 

if  Lmod  PERy  <  FWy 
if  Lmod  PERy  S  FWy 


and  LCM  is  the  least  common  multiple  of  all  the  periods  of  the  periodic  task  set 
Proof: 

Condition  1)  is  basically  Theorem  5  for  the  uniprocessor  case.  Conditions  2)  and 
3)  together  say  that  for  the  set  to  be  schedulable,  the  processor  demand  in  the  interval 
[OJ-]  (i.e.,  the  sum  of  computation  times  from  all  instances  that  must  finish  in  the  interval 
[OJ-]),  must  always  be  less  than  or  equal  to  the  length  of  L.  As  in  Jeffay’s  woric  [JSM91], 
the  contrapositive  of  Conditions  2)  and  3)  will  be  proven.  To  prove  the  contrapositive  of 
Condition  2),  consider  a  concrete  set  of  periodic  tasks  {Ti,  Ti, ....  Tn)  where  for  1  <  X  ^ 
n,  the  release  time  of  the  first  instance  of  Task  T,  =  0.  Then,  for  every  X,  1  ^  X  ^  n,  and 

every  k,  0  <  k  <  ,  the  processor  demand,  do*xPER  -fFw  ,  fi:x)m  all  task  instances  that 

PbKx  »  * 

must  finish  in  the  interval  [0,  kxPER,+FWJ  is  given  by 

dojodTO  ■♦w  —  XN(y,k  X  PERjj  +  FWj[)  X  METy 
*  *  y=l 


So  if  Condition  2)  does  not  hold,  then  there  exist  an  X  and  a  k  such  that 
do,kxPERj^4Fw^  >  kxPERx+FW,  and  P  has  an  unschedulable  concrete  set 

To  prove  the  contrapositive  of  Condition  3),  consider  a  concrete  set  of  periodic 
tasks  {Ti,  Ta, ...,  T„)  where  for  some  task  T, ,  the  release  time  of  its  first  instance  is  T,  = 
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0,  and  for  all  Y  X,  the  release  time  of  the  first  instance  of  task  Ty  =  1,  as  shown  in 
Figure  3.4. 


Since  neither  preemption  nor  inserted  idle  time  are  allowed,  the  first  instance  of 
task  Tx  must  execute  in  the  interval  [0,METx].  For  all  L,  FWi  <  L  <  FWx ,  in  the  interval 
[0,L]  the  processor  demand  dox,  from  all  task  instances  that  must  finish  by  time  L,  is  given 
by 

x-1 

dox  =  MET,+  I  N(ya^l)xMETy 

y=l 


So,  if  Condition  3)  does  not  hold,  then  dox  >  L.  and  P  has  an  unschedulable  concrete  setD 
Note  also  that  the  function  N(y J-)  can  also  be  expressed  in  closed  form  as  follows: 


N(y.L)  = 


PER 


yj 


+  imn 


f 

\ 

L 

1 

*  1 

FW  +  ^ 

^  y^  PERv 

X  PERy 

< 

L  ^  • 

• 

> 
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The  left  hand  side  of  the  addition  operator  specifies  how  many  full  periods  there 
exist  for  task  y  within  L,  while  the  right  hand  side  specifies  whether  the  remaining  fraction 
of  a  whole  period  is  large  enough  for  a  scheduling  interval  (i.e.,  FWy)  of  task  Y.  The 
minimum  comes  into  play  because  if  FWy  <  L/2  <  PERy  ,  it  would  contribute  more  than 
once  for  the  processor  demand  in  the  first  period,  which  cannot  occur. 

As  an  example  consider  the  task  set  Ti(8, 45,20),  T2(9,40,30),  and  T3(10,100,100), 
already  sorted  by  FW. 

Clearly,  n  =  3  and  the  interval  of  interest  is  20  <  L  <  100. 


Let  i  =  1,  then  L  =  20,  which  is  the  trivial  case. 

Let  i  =  2,  then  20  <  L  <  30 

for  20  <  L  <  30,  L  must  be  >  9  +  8  0 

Let  i  =  3,  then  20  <  L  <  100 

for  20  <  L  <  30,  L  must  be  >  10  +  8  0 

for  30  ^L<  65,  Lmust  be^  10  +  8 +  9  0 

for  65  ^L<  70,  Lmustbe^  10  +  8  +  8 +9  0 

for 70 ^ L <  100,  L must be^  10  +  8  +  8  +  9  +  9  0 


If  the  task  set  was  not  approved  in  all  conditions,  it  could  be  said  that  there  exist  at 
least  one  concrete  task,  that  could  not  be  scheduled.  Alternatively,  if  all  conditions  were 
satisfied,  then  nothing  else  could  be  stated  before  Theorem  7  is  introduced. 

Theorem  7: 

“If  an  independent  periodic  task  set  P  is  schedulable  according  to  Theorem  6,  then 
the  non-preemptive  Earliest  Deadline  First  (EDF)  algorithm  will  be  able  to  find  a  feasible 
schedule  forP.” 

Proof: 

As  in  Jeffay’s  woric  [JSM91]  this  theorem  shall  be  proved  by  contradiction. 
Assume  that  a  task  in  P  misses  a  deadline  at  some  point  in  time  when  P  is  scheduled  by  the 
EDF  algorithm.  Let  tj  be  the  earliest  point  in  time  at  which  a  deadline  is  missed.  All 
instances  of  P  can  be  partitioned  into  three  disjoint  sets  Si,  S2  and  S3  where: 
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51  is  the  set  of  task  instances  with  a  deadline  at  td; 

52  is  the  set  of  task  instances  with  an  invocation  before  t<i  and  deadlines  after  td, 
and 

53  is  the  set  of  task  instances  not  in  Si  or  S2. 

Let  to  be  the  end  of  the  last  period  prior  to  td  ,  in  which  the  processor  was  idle.  If 
the  processor  has  never  been  idle,  then  to  =  0.  Since  neither  preemption,  nor  inserted  idle 
time  are  allowed,  all  task  instances  which  are  executed  in  the  interval  [to,  td]  must  be 
activated  at  or  after  to  .  Dqjending  on  whetiiCT  the  interval  [to,  td]  contains  any  task  fiom 
the  set  S2,  the  following  two  cases  exist: 

Case  1:  None  of  the  tasks  in  S2  are  scheduled  in  the  interval  [to,  td]. 

This  case  only  happens  if  to  =  0.  Otherwise,  we  either  have  an  instance  that  misses 
its  deadline  in  the  interval  [0,  to]  if  to  *0  >  td  -  to  ,  or  the  processor  has  an  idling  period  in 
the  interval  [to  ,  td],  if  to-0  ^  td  -  to.  Furthermore,  td  ^  LCM.  Otherwise,  we  must  have 
another  instance  that  misses  its  deadline  prior  to  td. 

Let  Tj,  be  the  task  instance  that  misses  the  deadline  at  time  td.  Then,  td  -  0  = 

kxPER,+FW,  for  some  k,  0  ^  k  <  The  processor  demand,  do*xPER^.^Fw^,  from  aU 

instances  which  must  finish  in  the  interval  [0,  kxPERx+FWJ  equals 

iN(y,k  X  PER;^  +  FW. )  x  METv 

y=l  A  »  y 

and  it  is  greater  than  kxPERx+FW* ,  a  contradiction. 

Case  2:  Some  of  the  task  instances  of  S2  are  scheduled  to  run  in  the  interval  [to,  td]. 

Let  Tix  be  the  last  instance  in  S2  scheduled  to  run  prior  to  td  in  the  interval  [to,  td] 
and  let  tu  be  the  starting  time  of  To.  The  invocation  time  of  all  task  instances  scheduled  to 
start  in  the  interval  [to+l,  td]  must  be  at  or  after  to+l  and  with  deadline  at  or  before  td, 
otherwise  the  EDF  algorithm  will  not  schedule  Tj*  to  start  at  t^.  Hence,  the  process 
demand  for  the  interval  [to,  td],  t^tu.td » tnust  be  bounded  fixim  above  by  the  inequality 
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x-1 


dtj^,td  ^  METx  +  Zj  N(y,  td  -  (tix+D)  X  METy 


Since  there  is  no  idle  time  in  [tu>  t<i],  and  since  a  task  missed  a  deadline  at  t<i,  it 
follows  that  '  h*. 

LetL  =  td-tix.  Then 

FW,<L<FWx 

and 


x-1 

L  <  dtj^^td  -  N(y,  L-l)xMETy 


contradicting  condition  3  of  Theorem  6.  □ 

Note  that  Condition  3  in  Theorem  6  is  a  sufficient  but  not  necessary  condition  for 
schedulability  of  a  particular  concrete  task  set,  as  illustrated  by  the  following  example. 
Consider  the  task  set  Ti(l(X),150,150)  and  T2(l 00,300,200).  Qearly  it  does  not  satisfy 
Condition  2,  a  feasible  schedule  may  still  be  found  if  their  release  times  are  zero. 
However,  if  the  release  time  of  T2  is  changed  by  only  one  unit  of  time,  then  the  set  is  no 
longer  schedulable. 

Jeffay,  ct  al.  [JSM91],  have  shown  that  the  problem  of  determining  whether  a 
feasible  schedule  exists  for  a  particular  concrete  task  set  is  NP-Hard. 

C.  THE  HARMONIC  BLOCK  DILEMMA 

It  is  a  well  known  and  accepted  result  that  the  least  common  multiple  (LCM)  of 
the  periods  of  a  periodic  task  set  provides  a  finite  interval  of  time,  for  which  a  cyclic 
schedule  can  be  calculated,  if  one  exists,  and  repeated  forever  [Mok83]. 

Many  interpret  the  above  statement  to  mean  that  a  cyclic  feasible  schedule  must 
only  exist  in  the  closed  interval  [04.<CM],  i.e..  a  feasible  schedule  for  all  tasks  instances 
that  must  start  in  the  interval  [0J..CM]  and  complete  execution  by  time  LCM.  Such  an 
interpretation  holds  only  if  the  first  instance  of  every  task  Tx  is  restricted  to  complete  its 
execution  by  time  PERx.  But  what  if  such  a  restriction  is  not  desirable?  It  seems  very 
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reasonable  to  allow  the  &st  instance  of  a  periodic  task  to  start  within  its  period  of 
activation  but  finish  up  to  the  end  of  the  period  plus  its  computation  time,  and  actually  this 
would  be  a  very  desirable  property,  if  it  could  somehow  improve  the  already  difficult 
problem  of  non-preemptive  scheduling. 

Consider  the  task  set  Ti(190, 600,600)  and  12(20,200,200)  with  the  precedence 
relation  Ti  <  Ta,  as  illustrated  in  Figure  3.5. 


Figure  3.5.  The  Transient  and  Cyclic  Schedules 


Qearly,  no  feasible  schedule  exists  if  the  first  instance  of  every  task  T,  is  restricted 
to  complete  its  execution  by  time  PER,.  However,  if  it  is  allowed  to  the  first  instance  of 
every  task  T,  to  start  by  time  PER,  and  complete  its  execution  by  time  PER,  +  MET,, 
then  a  feasible  schedule  exists.  Note  also  that  the  cyclic  schedule  no  longer  starts  at  time 
zero,  but  starts  instead  at  time  tc,  and  furthermore,  there  can  be  more  than  one  task 
instance  that  does  not  finish  by  time  2xLCM,  as  can  be  illustrated  by  the  task  set 
Ti(4,100,100),  T2(2,5,5),  T3(2,100,100)  and  T4(3.10,10).  with  precedence  relations  T,  < 

T2<T3<T4. 

Here  is  where  a  novel  approach  on  how  to  determine  what  is  a  suitable  cyclic 
schedule  comes  into  play.  The  fundamental  concept  is  that  a  feasible  static  schedule 


consists  of  two  parts:  a  transient  part,  which  may  be  empty,  followed  by  a  cyclic  part, 
which  repeats  forever. 

The  next  theorem,  the  Harmonic  Block  Theorem,  although  different  finom  the  one 
introduced  by  Zhu,  et  al.  [ZLC94],  was  created  after  a  careful  analysis  of  their  work, 
which  does  not  correctly  solve  the  problem.  The  general  direction  of  the  proof  will 
consist  in  showing  that  if  the  premises  of  Theorem  8  are  satisfied,  then  there  exists  some 
time  tc  where  a  part  of  the  schedule  can  be  divided,  with  exactly  the  size  of  one  LCM, 
where  it  is  guaranteed  that  the  correct  number  of  task  instances  are  present,  and  most 
importantly,  that  they  all  start  and  finish  within  that  time  interval,  characterizing  the  cyclic 
part  of  the  new  schedule. 

Theorem  8:  The  Harmonic  Block  Theorem 

“If  3  an  infinite  feasible  schedule  S  without  any  inserted  idle  time  for  a  periodic 
task  set  P  with  precedence  constraints,  such  that  the  first  instance  of  every  task,  Tx  in  P 
must  start  by  time  PER*,  then  there  exists  an  infinite  feasible  schedule  S’  consisting  of  a 
transient  portion  of  length  at  most  LCM,  followed  by  a  cyclic  portion  of  length  LCM  that 
repeats  forever.” 

Proof: 

If  there  is  no  idling  time  period  in  the  intervals  [OJ-CM]  or  [LCM,2xLCM],  then 
the  given  set  of  periodic  tasks  P  must  have  a  load  factor  of  1,  and  the  first  instance  of 
every  task  T,  must  finish  its  execution  at  or  before  time  Px  in  any  feasible  schedule. 
Hence,  the  segment  of  S  in  the  interval  [OL-CM]  forms  the  cyclic  portion  of  an  infinite 
feasible  schedule  satisfying  the  Theorenx 

Suppose  now  that  idling  time  exists  in  the  intervals  [OXCM]  and  [LCM.2xLCM]. 
Let  tc  be  the  end  of  the  last  period  prior  to  time  LCM  in  which  the  processor  was  idling  in 
S,  and  let  ti  be  the  end  of  the  last  period  prior  to  time  tc+LCM  in  which  the  processor  was 
also  idling  in  S  as  shown  in  Figure  3.6. 
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Figure  3.6.  Determining  the  Start  Time  tc  of  the  Cyclic  Schedule 


Assertion  (1) 

Since  no  unnecessary  idle  time  is  inserted  in  our  schedule  S.  it  should  be  clear  that 
there  cannot  be  my  first  instances  of  tasks  being  activated  after  time  tc,  because  otherwise 
they  could  have  started  execution  before  time  tc. 

Assertion  (2 ) 

Another  important  point  to  be  made  is  that  all  tasks  which  stan  after  time  ti  could 

not  be  activated  before  time  t,,  for  the  same  reasons  of  non-insened  idle  time  in  our 
schedule  S. 

Assertion  (3 ) 

Every  task  mstance  that  is  activated  in  the  interval  [ti,tc+LCM)  must  finish  its 
execution  at  or  before  tc+LCM.  Suppose  this  claim  is  not  true.  Then  there  must  exist 
some  instances  which  are  activated  before  tc+LCM  and  cannot  finish  at  or  before  tc+LCM. 
Denote  the  collection  of  all  instances  which  are  activated  in  the  interval  [ti ,  tc  +  LCM)  by 

X.  It  follows  from  assertion  (2)  that  every  instance  in  t  must  be  activated  in  the  interval 
Itj,tc+LCM).  This  implies  that 

SmETix^  tc+LCM*tj 

Let  t' denote  the  set  of  task  instances  that  are  activated  in  the  interval  [tpLCM.tc). 

It  foUows  from  assertion  (1)  that  every  task  instance  in  t  must  have  a  corresponding 
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Note  that  all  instances  in  x'  must  finish  within  the  interval  [ti-LCM,tc],  because  tc  is 
the  end  of  an  idling  period.  Hence, 

X  METiy  <  tc  -  (ti-LCM)  =  tc+LCM-ti  (iii) 

Tjyex' 

From  inequalities  (i),  (ii),  and  (iii), 

tc+LCM-ti  <  XMETix  <  tc+LCM-ti , 

TueT 

which  is  a  contradiction. 

Assertion  (4) 

All  instances  after  tc  are  at  least  second  instance  and  hence,  for  all  tasks  T*  within 
the  interval  [tc  ,tc+LCM),  there  must  exist  activations.  By  assertion  (3)  they  aU 

finish  within  this  same  interval.  The  segment  of  S  in  the  interval  [tc  ,tc+LCM)  contains  the 
correct  number  of  instances. 

Concluding  the  proof,  it  can  be  said  that  the  intervals  [0,te]  and  [tcU+LCM]  of  S 
form  respectively  the  transient  portion  and  the  cyclic  portion  of  the  new  schedule  S’, 
satisfying  the  consequence  of  the  Theorem.  □ 

As  can  be  seen,  by  a  proper  choice  of  the  start  time  of  the  cyclic  portion  of  the 
schedule,  one  can  increase  the  schedulability  of  tasks  sets  which  were  previously  assumed 
to  have  no  feasible  schedule,  when  the  cyclic  schedule  was  restricted  to  always  start  at 
time  zero.  Note  also  that  the  same  approach  is  valid  for  preemptive  task-  sets. 

D.  A  NOTE  ABOUT  PRECEDENCE  CONSTRAINTS 

Eveiy  reference  to  the  word  precedence  constraints  between  tasks  is  usually 
attached  to  the  meaiung  of  synchronization,  in  other  words,  if  two  tasks  have  some  kind  of 
precedence  relation,  then  they  must  be  synchronized.  Furthermore,  if  their  periods  arc 
different,  then  they  should  be  synchronized  at  intervals  corresponding  to  the  least  common 
multiple  of  their  periods.  But  then,  what  is  the  real  need  for  synchronization  if  there  are 
cases  where  some  data  may  well  be  lost?  Does  it  exist  only  to  enforce  a  fixed  pattern  on 
how  data  are  lost,  e.g.,  instances  three  from  task  X  and  two  from  task  Y,  sbe  and  four  and 
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so  forth  will  synchronize?  These  and  other  questions  will  be  much  further  discussed  in 
Chapter  FV. 

We  shall  argue  in  Chapter  IV  that  the  major  reason  for  synchronization  is  to 
guarantee  timely  processing  of  triggering  data.  We  shall  show  that,  by  relaxing  the  upper 
bound  on  the  delay  in  processing  each  instance  of  triggering  data,  we  can  guarantee  that, 
even  without  explicit  synchronization,  each  instance  of  the  trigger  data  will  be  processed 
within  an  interval  equal  to  two  times  the  period  of  the  consumer  operator.  The  removal  of 
the  need  for  synchronization  is  particularly  inqwrtant  in  distributed  systems,  where 
synchronization  mechanisms  are  very  costly  if  not  impossible.  It  is  also  desirable  not  to 
have  synchronization  in  uni-processor  systems,  because  now,  we  can  treat  each 
topological  ordering  of  the  tasks  satisfying  the  precedence  relationships  as  a  concrete  set 
of  periodic  tasks,  where  the  starting  time  of  task  T,  is  greater  than  or  equal  to  the  sum  of 
the  METy  of  all  tasks  Ty  that  are  ancestors  of  T*  in  the  task  graph. 

Note  that  if  non-zero  latency  is  present  in  the  edges  of  the  precedence  graph,  then 
we  must  further  delay  the  starting  time  of  the  first  instances  of  every  task  Y,  so  that  Siy  > 
max{Su+METx+LAT;^  ,  Vparent  operator  T*  of  Ty),  where  LAT:iy  denotes  the  latency 
associated  with  the  edge  (T, ,  Ty). 

In  order  for  the  arguments  in  the  proof  of  Theorem  8  to  hold,  we  need  to  choose  U 
to  be  the  end  of  the  first  idling  period  after  time  LCM,  resulting  in  a  Modified  Harmonic 
Block  Theorem  that  reads: 

Theorem  9: 

‘Tf  3  an  infinite  feasible  schedule  S  for  a  periodic  task  set  P  with  precedence 
constraints,  such  that  the  first  instance  of  every  task,  Ty  in  P  must  stan  by  time  PER,,  then 
there  exists  an  infinite  feasible  schedule  S’  consisting  of  a  transient  portion  of  length  at 
most  2xLCM,  followed  by  a  cyclic  portion  of  length  LCM  that  repeats  forever.” 

Proof: 

The  main  difference  when  dealing  with  latencies,  is  that  idling  periods  may  exist 
before  the  starting  time  of  the  first  instance  of  some  task  T»  in  the  schedule.  Theorem  8 
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srill  holds  for  this  case,  because  the  presence  of  idling  time  only  affects  the  release  time  of 
the  tasks,  as  long  as  PERy  ^  Siy  ^  max{  Six+METj+LATxy }.  However,  for  Theorem  8  in 
Section  C,  the  cyclic  portion  of  the  schedule  may  now  start  after  time  LCM.  The  reason  is 
because  the  schedule  S  may  contain  first  instances  in  the  interval  [tc,  tc+LCM],  which  was 
the  key  in  our  previous  proof  of  Theorem  8.  After  these  considerations,  the  same  proof 
used  for  Theorem  8  can  be  applied  to  this  case.  O 

E.  COPING  WITH  APERIODIC  TASKS 

Generally  speaking,  a  sporadic  task  is  defined  as  an  apoiodic  task  that  has  a 
minimum  duration  between  two  consecutive  activations.  If  that  was  not  so,  neither  the 
static  nor  the  dynamic  approach  could  be  used  to  guarantee  schedulability. 

If  interrupts  are  used  to  detect  the  occurrence  of  aperiodic  events  at  run-time,  then 
a  dynamic  approach  should  be  used.  However,  in  the  static  scheduling  framework,  where 
all  the  tasks  requests  must  be  known  a  priori,  so  that  a  fixed  and  static  schedule  can  be 
generated,  the  only  way  to  handle  sporadic  tasks  where  we  do  not  know  exactly  when 
they  are  going  to  happen,  is  by  using  a  periodic  process  to  function  as  a  polling  device.  Its 
main  role  is  to  check  for  requests  of  sporadic  tasks  and  to  serve  them  during  its  allocated 
time  slot.  However,  due  to  the  random  nature  of  aperiodic  processes,  we  may  not  be  able 
to  handle  a  concentrated  set  of  arrivals  or  even  worse,  not  catch  them  at  all  with  the 
sporadic  server  approach.  To  overcome  this  difficulty,  several  bandwidth  preserving 
algorithms  have  been  proposed.  Among  them  could  be  mentioned  the  Priority  Exchange, 
Deferrable  Server  and  the  Sporadic  Server.  [AB93] 

The  CAPS  approach  was  to  use  one  sporadic  server  for  each  time-critical  sporadic 
operator.  This  approach,  although  very  restrictive,  is  the  only  way  to  guarantee  that  all 
time-critical  sporadic  tasks  would  be  serviced  in  a  timely  fashion  under  the  worst  case 
situation. 

Therefore,  the  next  step  is  to  conven  the  sporadic  operator  into  a  periodic  one  so 
that  all  the  original  timing  constraints  from  the  sporadic  operator  are  still  satisfied. 
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1. 


The  Conversion 


The  term  triggering  period  (TP)  will  be  used  for  the  period  of  the  converted 
sporadic  operator  and  the  usual  term  FW  for  its  finish-within.  As  shown  in  Figures  3.7 
and  3.8,  basically  two  cases  can  occur; 

The  first  is  when  MCP  <  MRT  -  MET  and  the  equivalent  periodic  operator  must 
have  TP  ^  MCP  in  order  to  satisfy  the  original  timing  constraints.  Also,  must  enforce  that 
FW  =  MRT  -  MCP,  so  that  in  the  critical  case  shown  in  Figure  3.7,  the  data  that  was 
missed  by  the  previous  triggering  period  can  be  consumed  by  the  next  TP  and  still  finish 
within  the  original  MRT. 


The  second  case,  shown  in  Figure  3.8,  occurs  when  MRT  -  MET  ^  MCP.  This 
more  constrained  situation  forces  a  further  reducdon  in  the  triggering  period.  Thus,  the 
new  TP  should  be  TP  £  MRT  -  MET  and  the  FW  should  be  equal  to  MET. 


Figure  3.8.  The  Sporadic  CtMiversion  when  MCP  ^  MRT-MET 


In  general,  the  triggering  period  should  be 


MET  <  TP  ^  imn(MRT  -  MET,  MCP). 

Nevertheless,  in  order  to  minimize  the  impact  on  the  load  factor  of  the  prototype, 
it  is  desirable  that  TP  be  as  large  as  possible,  meaning  that 

TP  =  min(MRT  -  MET,  MCP). 

Now,  assuming  that  the  values  for  TP  and  FW  have  been  established,  so  that  the 
original  timing  constraints  of  the  sporadic  operator  are  satisfied,  let's  see  what  kind  of 
relations  should  exist  between  the  original  values,  so  that  we  could  validate  them. 


Qearly; 

• 

MET^MRT 

(by  Theorem  2) 

• 

MET^MCP 

(by  Theorem  1) 

• 

METSTP 

(by  Theorem  1) 
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•  TP  ^  MCP  (for  static  scheduling)^ 

•  MET^FW^TP  (Scheduling  Model) 

For  case  A:  MCP  <  MRT-MET 

Eq.(2) 

TP  =  MCP 

Eq.  (3) 

and 

FW  =  MRT-MCP 

Eq.(4) 

Plugging  (3)  and  (4)  into  (2), 

MET  ^  MRT  -  MCP  ^  M(3> 

Eq.(5) 

From  the  right  inequality  of  (5), 

MRT<2xMCP 

Plugging  (1)  into  the  left  inequality  of  (5), 

MRT^2xMET 

For  case  B:  MRT-MET  <  MCP 


TP  =  MRT-MET 
and 

FW  =  MET 

Plugging  (6)  and  (7)  into  (2), 

MET ^ MET ^MRT -  MET 
From  the  right  inequality  of  (8), 


Also, 


MRT^2xMET 


MRT-MET^MCP  or  MRT-MCP^MET 
Plugging  (1)  into  the  above  inequality, 


Eq.  (6) 
Eq.(7) 
Eq.  (8) 


MRT-MCPSMCP  or  MRTS2xMCP 
Therefore  the  MRT  for  a  sporadic  operator  must  be  upper  bounded  by  twice  its 
MCP  and  lower  bounded  by  twice  its  MET,  as  follows: 


*  Otherwise  we  would  have  to  be  able  to  detect  at  run-time  when  new  data  had  arrived,  only  possible 
with  dynamic  scheduling. 
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2xMET^MRT<2xMCP 

Note  that  when  MRT  assumes  its  lowest  possible  value,  which  is  2  x  MET,  the 
triggering  period  TP  will  also  reflect  its  lowest  possible  value,  which  is  MET,  with  FW 
still  being  equal  to  MET.  This  case  is  illustrated  in  Figure  3.9. 


Note  that  in  both  cases  the  conversion  of  a  sporadic  operator  results  in  very 
stringent  timing  constraints  to  the  equivalent  periodic  operator.  This  will  definitely  have  a 
great  impact  on  the  schedulability  of  the  prototype.  In  the  second  case,  for  example,  there 
is  no  slack  time  for  the  converted  operator,  since  FW  =  MET.  This  forces  us  to  remove 
out  portions  of  MET  from  the  schedule,  where  no  other  operator  could  be  scheduled. 

Of  course,  the  amount  of  slack  time  for  this  operator  can  be  increased  by 
decreasing  its  TP,  but  this  will  also  increase  the  entire  load  factor.  Basically,  there  exists  a 
trade-off  between  load  factor  and  slack  time.  How  much  to  increase  one  in  detriment  of 
the  other  to  increase  schedulability  is  a  very  difficult  question. 
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While  this  question  does  not  have  an  answer,  it  does  offer  suggestions  to  help 
designers  in  finding  solutions  that  best  fit  their  needs. 

When  converting  a  sporadic  operator  into  an  equivalent  periodic  one,  the 
triggering  period  (TP)  can  range  fix>m  a  mininauin  of  MRT/2,  where  the  slack  time  is  equal 
to  MRT/2  —  MET,  up  to  a  maximum  value  equal  to  min(MRT-MET,  MCP),  implying  that 
the  slack  time  is  max((MRT-MET-TP),  0). 

MET  MET  .  . 

First,  define  load  factor  contribution  as  LFC  =  __ - « i-c.,  the  ditierence 

TP  TPm« 

between  the  corresponding  LF  for  a  specific  triggering  period  TP,  and  the  load  factor  if 
TP  were  set  to  its  maximum  value.  Within  the  interval  MRT/2  <  TP<  min(MRT-MET, 
MCP),  the  slack  time  ST,  which  is  the  scheduling  interval  for  the  sporadic  task  minus  its 
computation  time,  is  defined  as  ST  =  MRT  -  MET  -  TP,  as  can  be  derived  firom  Figures 
3.7  and  3.8. 

Qearly,  when  TP  is  maximum,  the  load  factor  contribution  (LFC)  is  zero,  in  the 
sense  that  it  cannot  be  increased  any  further.  For  the  other  values  of  TP,  including  those 
enforced  in  the  conversions  for  the  previous  cases  A  and  B,  some  considerations  must  be 
taken  into  account.  Assume  that  MCP  ^  MRT-MET.  Although  it  may  appear  at  first  that 
LFC  varies  with  MRT,  since  TP  is  lower  bounded  by  MRT/2,  that  is  not  the  case,  in  other 
words,  MRT  only  limits  the  valid  range  for  TP.  Figure  3.10  shows  a  famDy  of  curves  for 
different  values  of  MCP,  and  for  a  fixed  value  of  MET  and  MRT.  As  explained  earlier, 
LFC  is  insensitive  to  changes  in  MRT. 

The  load  factor  contribution  LFC,  as  previously  defined,  is  a  function  inversely 
proportional  to  the  triggering  period  TP,  and  that  it  will  decrease  faster  for  periods  less 
than  TPg  =‘>/MET  ,  where  its  first  derivative  with  respect  to  TP  is  equal  to  -1*.  Note, 
however,  that  TP  cannot  be  smaller  tfian  MET,  meaning  that  TP^  wll  always  be  located 


*  Care  must  be  taken  to  the  fact  that  the  derivative  at  some  point  being  equal  to  -1.  docs  not  imply 
that  ihe  slope  equals  135“  at  that  point,  since  both  axes  may  have  difTercnt  scales,  as  shown  in  Figure  3.10. 
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to  the  left  of  any  valid  value  for  TP.  The  main  conclusion  is  that  different  values  of  MCP 
have  very  small  effect  in  the  variation  of  LFC.  Similar  conclusion  can  also  be  drawn  for 
the  case  where  MCP  <  MRT-MET.  Therefore,  in  any  case,  the  consequence  is  that  we 
always  have  the  full  range  of  TP,  ftom  MRT/  2,  up  to  min  (MRT-MET,  MCP)  to  change 
TP,  without  causing  any  harm  to  the  load  factor  of  the  system. 
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Figure  3.10.  Effects  of  TP  on  the  Load  Factor 

Note  that  the  very  first  question  remains  unanswered,  but  now,  the  effects  in  the 
total  load  factor  are  more  clearly  understood  when  the  triggering  period  is  changed. 

2.  Important  Remarks  about  the  Conversion 

This  first  idea  of  conversion  of  sporadic  operators  was  introduced  Mok 
[Mok83]  in  his  Lemma  2.3  which  stated 

“Let  M  =  Mp  u  M,  be  an  instance  of  a  process  nx)del.  Supp>ose  we 
replace  every  sporadic  process  Tj  =  (Ci,pi,di)  €  M.  by  a  periodic  process  T’l 
=  (c’i,p’i,d’0  with  c’i=  Ci,  p’i  =  min(di-Ci+l,  pO  and  d’i  =  Cj.  If  the  resulting 
set  of  all  periodic  processes  M’can  be  successfully  scheduled,  then  the 
original  set  of  processes  M  can  be  scheduled  without  a  priori  knowledge  of 
the  request  times  of  the  sporadic  processes  in  M,.** 
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Note,  however,  that  although  the  idea  of  the  transformation  is  valid,  care  must  be 
taken  to  see  the  context  in  which  that  sporadic  operator  appears,  since  some  of  its 
attributes,  such  as  minimum  calling  period,  are  totally  dependent  upon  the  producer  of  the 
triggering  data  and  not  on  the  sporadic  operator  itself.  In  other  words,  if  the  producer  of 
data  for  some  sporadic  task  is  an  external  event  that  will  be  handled  by  some  kind  of 
intermpt  handler,  then  there  wfll  be  no  influence  whatsoever  in  the  generation  of  the  data, 
and  the  minimum  period  will  be  obeyed  by  the  external  device.  However,  if  the  producer 
is  another  task  that  will  be  included  in  our  static  schedule,  it  must  be  assured  that  two 
consecutive  instances  of  the  producer  operator  will  not  be  scheduled  closer  than  the 
minimum  period  specified  for  the  sporadic  consumer.  In  this  case,  the  transformation 
alone  is  not  enough,  and  an  additional  lesiriction  must  be  imposed  on  the  producer  of  the 
data.  This  situation  is  depicted  in  Hgure  3.1 1. 

In  conclusion,  it  can  be  said  that  Mok’s  lemma  by  itself  does  not  guarantee  that  a 
schedule  really  exists  for  the  original  set,  even  if  the  resulting  set  of  all  periodic  processes 
M’  can  be  successfully  scheduled,  unless  as  explained  earlier,  a  restriction  is  imposed  on 
the  producers  as  well. 


Figure  3.11.  Restrictions  on  the  Producer  Imposed  by  the  Consumer’s  MCP 
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3.  Implementations  Issues  about  the  Conversion 

When  implementing  this  conversion  it  is  strongly  recommended  that  a  careful 
analysis  of  the  task  graph  be  made  to  determine  reasonable  bounds  for  the  period  of  the 
transformed  sporadic  operator.  At  &st  glance,  an  obvious  upper-bound  is  the  value  of  its 
MCP.  However,  for  lower-bounds  diis  choice  is  not  so  clear.  Nonetheless,  it  is  assumed 
that  after  this  pre-processing  there  will  be  an  interval  of  posable  values  for  the  period  of 
the  transformed  sporadic  task.  The  reason  for  these  bounds  is  to  provide  us  with  some 
margin  for  making  the  conversion,  so  that  the  final  harmonic  block  of  the  entire  set  is  not 
increased  significantly. 

Given  a  set  of  sporadic  operators,  the  following  steps  are  suggested  for  the  final 
choice  of  their  periods; 

1)  Set  the  period  of  every  sporadic  task  to  its  upper-bound,  so  that  the  total  load 
factor  is  minimized 

2)  Try  to  find  a  feasible  schedule  for  the  entire  prototype  (if  this  is  not  possible 
pick  one  sporadic  task) 

3)  Start  decreasing  its  period; 

4)  For  each  new  period  check  for  schedulability; 

5)  Proceed  until  its  lower-bound  is  reached.  If  no  schedule  is  found  reset  its  period 
to  the  upper-bound,  pick  another  task  and  go  back  to  step  3; 

Another  possible  heuristic  is  to  assign  the  smallest  period  among  the  periodic 
operators  which  is  closest  to  but  smaUer  than  the  upper-bound  of  the  sporadic  operator, 
and  then  proceed  with  the  schedulabili^  tests.  One  could  also  try  to  minimize  the 
harmoruc  block.  As  can  be  seen,  there  are  several  possible  heuristics,  but  there  is  no 
<^timal  solution.  Nevertheless,  it  is  understood  that,  due  to  the  very  stringent  timing 
constraints  resulting  fiom  the  conversion,  every  possible  attention  should  be  given  to  this 
step. 
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IV.  DISTRIBUTED  SCHEDULING 


A.  INTRODUCTION 

For  uniprocessor  systems,  most  scheduling  problems  involving  precedence 
constraints  can  be  solved  in  polynomial  time.  Lawler  [Law73]  showed  that  scheduling 
non-preemptable  tasks  with  unit  computation  times,  deadlines,  and  arbitrary  precedence 
constraints  can  be  accomplished  using  the  Latest  Deadline  First  Algorithm  in  0(n^)  time. 
Similar  results  wo-e  obtained  1^  Lageweg,  Lenstra,  and  Kan,  even  for  tasks  with  an 
arbitrary  computation  time,  if  the  release  times  were  assumed  to  be  zero  for  all  tasks. 
Blazewicz  [Bla76]  proved  that,  for  this  scheduling  problem,  a  preemptive  schedule  exists 
if  and  only  if  a  non-preemptive  schedule  exists.  Therefore,  in  this  case,  preemption  need 
not  be  considered.  Blazewicz  also  demonstrated  that  the  Earliest  Deadline  First  algorithm 
can  also  be  used  to  schedule  preemptable  tasks.  The  only  scheduling  problem  involving 
precedence  relations  that  has  been  proven  to  be  NP-complete  is  the  non-preemptable  case, 
where  no  restrictions  are  placed  on  the  release  times  nor  on  the  computation  times.  The 

non-preemptable  case  is  also  NP-complete  if  there  are  no  precedence  relations  among  the 
tasks  [GJ77a]. 

Scheduling  tasks  with  precedence  constraints  in  multiprocessor  systems  is  much 
more  difficult  than  doing  so  in  uniprocessor  systems.  For  example,  scheduling  tasks  with 
arbitrary  precedence  constraints  and  unit  computation  time  is  NP-haid  both  for  the 
preerr5)tive  and  the  non-preemptive  cases  [U1175,  U1176]. 

Many  researchers  have  attempted  to  develop  efficient  heuristics  algorithms  to 
solve  the  general  problem,  but  with  limited  success.  In  most  cases,  the  researcher  ended 
up  restricting  the  solution  space  for  specific  cases,  such  as  when  the  task  graph  is  a  forest, 
or  when  there  are  no  precedence  constraints. 

In  general,  two  different  approaches  to  handling  distributed  computation  can  be 
identified.  In  the  first,  the  distributed  system  is  coordinated  by  a  single  system  clock, 
which  synchronizes  all  tasks  so  that  computation  progresses  in  a  lock-step  fashion,  and 
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communication  between  tasks  can  only  occur  at  specific  times.  In  the  second  approach, 
tasks  are  synchronized  only  when  necessary,  and  do  so  by  executing  appropriate  hand¬ 
shake  protocols.  The  former  approach  requires  less  inter-processor  communication,  but  is 
rigid,  and  relies  on  a  global  clock  whose  in::5)lementation  is  by  itself  another  very  difficult 
problem  to  solve.  The  latter  approach,  although  more  flexible,  dramatically  increases  the 
complexity  of  the  synchronization  problem,  and  may  be  very  costly  in  terms  of 
communication,  since  many  acknowledge  signals  must  be  exchanged  in  order  to  maintain 
propCT  synchronization.  The  use  of  rigorous  and  more  constrained  timing  requirements 
allows  for  the  establishment  of  a  weak  form  of  synchronization  among  the  tasks  of  the 
distributed  system,  and  represents  an  alternative  in  the  middle  [Mok83]. 

B.  ARCHITECTURAL  ISSUES 

This  section  is  not  intended  to  present  an  in-depth  analysis  of  the  effects  of  the 
architecture  on  distributed  scheduling,  but  merely  to  introduce  some  of  the  problems  so 
that  the  reader  may  be  aware  of  their  existence  and  importance. 

In  a  distributed  environment,  it  is  very  likely  that  one  will  have  to  deal  with 
heterogeneous  computers,  each  one  with  a  different  clock,  different  memory  systems,  and 
so  forth.  It  is  therefore  important  to  realize  how  these  attributes  can  affect  scheduling. 

1.  Different  Clocks 

The  precision  of  a  clock  is  direcdy  related  to  its  granularity,  the  minimum  number 
of  ticks  it  can  handle,  and  the  quality  of  its  time  reference,  which  is  usuaUy  based  on  some 
kind  of  crystal.  The  first  limiting  factor  imposed  by  the  clock,  therefore,  is  the  minimum 
acceptable  period.  This  is  not,  however,  an  actual  limitation,  since  typical  clocks  range 
fiom  tens  to  hundreds  of  megahertz,  providing  an  order  of  nanoseconds  for  the  rttinimum 
allowable  period.  The  real  problem  is  that  clocks  can  drift  among  themselves,  causing  a 
variety  of  synchronization  problems.  Maintaining  an  accurate  global  clock  is  one  of  the 
most  challenging  tasks  in  the  distributed  processing  arena.  Usually  this  is  achieved  at  the 
cost  of  substantial  overhead  in  communications. 
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2.  Speed  of  CPUs 

The  net  result  when  different  processors  are  present  is  a  different  execution  time 
for  the  same  piece  of  code  when  running  in  the  various  processors.  This  factor 
necessitates  previous  knowledge  of  allocation  by  the  scheduler,  so  that  it  can  be  taken  into 
account.  Within  CAPS,  this  is  accomplished  automatically,  because  a  kind  of  simulated 
timft  is  used  for  scheduling,  which  is  scaled  according  to  the  speed  of  the  machine  on 
which  it  runs. 

3.  Memory 

Issues  like  cache  size,  paging,  number  of  pipelining  stages,  etc.,  can  affect  the 
overall  throughput  of  the  system,  and  consequently  the  timing  requirements,  but  hopefully 
all  of  these  different  delays  are  already  taken  into  account  by  the  specified  maximum 
execution  time  of  the  task. 

4.  The  Communication  Media 

This  is  one  of  the  most  inqxirtant  factOTS  in  dealing  with  distributed  systems,  and 
can  greatly  affect  final  timing  requirements  for  the  application.  Note  also  that  the  timing 
requirements  are  affected  not  only  by  the  actual  transmission  delay,  but  also  by  the 
operating  systems  functions  invoked  on  behalf  of  the  applications.  In  CAPS,  for  example, 
although  there  is  a  time-bounded  protocol  (FDDI)  it  is  still  necessary  to  make  calls  to  the 
underlying  Unix  operating  system,  which  has  no  support  for  real-time  applications. 

5.  Interconnectivity 

The  number  of  processors,  the  distance  by  which  they  are  separated,  there  abilities 
to  communicate  with  one  another,  etc.,  are  issues  that  should  be  raised  before  tackling  the 
scheduling  probleni 

C  THE  PROBLEM  STATEMENT 

To  reiterate,  the  original  objective  of  this  research  was  to  find  better  methods  of 
supporting  efficient  and  reliable  scheduling  of  distributed  hard  real-time  systems. 
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It  is  unquestionable  that  the  ideal  real-time  distributed  system  should  be  able  to 
support  groups  of  tasks  running  asynchronously  in  different  processors,  each  processor 
having  its  own  internal  clock.  An  additional  goal,  despite  the  precedence  relations  among 
the  tasks,  would  be  to  eliminate  the  need  for  enforcement  of  any  kind  of  synchronization 
required  for  communication.  An  even  more  in^rtant  goal  would  be  that  all  the  deadlines 
and  other  requirements  (such  as  no  loss  of  data,  etc.)  could  be  met. 

Being  aware  of  the  complexity  of  the  message  routing  problem  described  in 
Chapter  I  and  reviewing  the  alternatives  presented  in  Section  A,  it  appears  to  be  that  the 
best  available  option  to  achieve  the  ideal  system  is  die  very  last  alternative,  i.e.,  to  sacrifice 
timing  constraints  in  order  to  decrease  scheduling  complexity.  Unfortunately,  that  is  not 
the  current  trend  in  most  researches  in  the  field  of  distributed  scheduling  today. 
Researchers  are  still  trying  to  find  better  heuristics  to  scheduling  algorithms  so  that  the 
timing  complexity  for  a  sub-optimal  case  is  decreased  by  some  constant  factor.  But,  due 
to  the  NP-Hard  nature  of  the  problem,  it  is  most  likely  that  some  restrictions  will  be 
imposed  on  the  irtitial  problem. 

This  work  moves  in  the  other  direction,  in  other  words,  investigating  ways  of 
restricting  or  relaxing  the  timing  requirements  so  as  to  increase  the  chances  of  finding  a 
feasible  schedule.  It  is  understood,  however,  that,  depending  on  the  application,  this 
approach  may  not  be  practicable.  It  may  well  be  that  most  of  the  timing  requirements 
cannot  be  changed  at  all.  However,  this  is  most  likely  untrue  for  most  cases.  Especially  in 
this  applications  framework,  where  the  user  is  prototyping  the  intended  system  in  the  early 
stages  of  its  life  cycle,  there  is  an  opportunity  to  validate  and  change  the  system’s 
requirements,  which  makes  this  approach  very  attractive.  Note,  however,  that  this 
discussion  is  not  about  missing  deadlines  or  etr5)loying  inprecise  computations  [LLS91], 
but  focuses  simply  on  relaxing  timing  constraints  so  that  no  synchronization  is  needed,  and 
consequently  decreasing  substantially  the  complexity  of  the  distributed  scheduling 
problem 
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The  next  section  addresses  the  underlying  semantics  behind  all  possible 
combinations  of  triggering  conditions,  stream  types  and  operator  types  within  a  valid 
PSDL  program,  so  that  later,  when  discussing  the  major  synchronization  issues,  it  is 
certain  that  all  cases  have  been  covered. 

D.  SYNCHRONIZATION  IN  PSDL 

Thwe  are  two  kinds  of  streams  in  PSDL,  Sampled  Streams  (SS)  and  Data  Flow 
Streams  (DF).  Note,  however,  that  within  the  former  are  two  semantically  different  sub- 
types  of  streams,  depending  on  the  triggering  condition  of  the  consumer  operator.  If  the 
consumer  operator  is  not  triggered  (NT)  by  any  data,  then  it  should  be  understood  that  a 
specific  data  value  can  be  lost  or  overwritten,  or  even  read  over  and  over  again  by  the 
consumer,  without  any  harm  to  the  system.  This  type  of  behavior  is  very  useful  when 
reading  sensor  data.  In  most  cases,  the  sensors  will  be  able  to  generate  data  in  a  much 
higher  rate  than  the  consumer  will  read  it,  but  the  most  recent  data  is  of  primary  interest. 
Even  for  tracking  systems,  where  the  history  of  data  values  is  very  important,  this  kind  of 
stream  is  still  very  useful  Note  in  Figure  4.1  that  a  specific  value  at  some  previous  time  t 
is  not  relevant,  because  the  consumer  is  only  interested  in  the  average  behavior,  so  that  the 
filter  algorithm  can  predia  the  future  position  of  the  target  In  this  kind  of  situation,  no 
S3nichroiuzation  is  needed,  releasing  the  producer  and  consumer  operators  from  any 
constraints  on  their  periods. 
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The  second  type  of  Sampled  Stream  exists  when  the  consumer  operator  is 
TRIGGERED  BY  SOME  (TBS)  data  value.  By  definition,  the  consumer  with  this 
triggering  condition  should  always  catch  a  new  piece  of  data  if  it  is  from  one  of  the 
streams  specified  in  the  TRIGGERED  BY  SOME  clause.  For  example,  if  some  operator 
OPl  is  TRIGGERED  BY  SOME  X,  Y,  then,  if  new  data  is  coming  from  either  X  or  Y,  it 
should  be  guaranteed  to  be  read,  and  not  lost  or  overwritten. 

Although  buffer  overflow  or  underflow  is  not  an  issue,  due  to  the  way  sampled 
streams  are  defined,  the  only  way  to  avoid  loss  of  data  in  this  case  is  to  enforce  the 
condition  that  PER  .  >  PER  ,  and,  consequently,  the  synchronization  problem 

will  have  to  be  handled  accordingly. 

Hnally,  in  the  case  of  Data  Flow  Streams,  where  the  consumer  is  TRIGGERED 
BY  ALL,  the  inputs  specified  in  the  TRIGGERED  BY  ALL  clause  for  new  data  should  be 
examined,  and  if  all  of  them  happen  to  have  new  data  in  their  buffer,  they  should  be 
consumed,  firing  the  operator.  The  TRIGGERED  BY  ALL  condition  can  be  thought  of 
as  being  a  logical  AND  among  the  streams  declared  in  the  TRIGGERED  BY  ALL  clause. 
Qearly,  in  this  case,  there  is  also  a  need  to  enforce  PER _ .  t  PER  so  that  no  data 

^  SCOuUCCf  ooosu^Dcr 

is  lost,  and  once  again  the  synchronization  problem  must  be  handled  explicitly. 

The  basic  semantic  difference  between  the  TRIGGERED  BY  ALL  data  flow 
streams  and  the  TRIGGERED  BY  SOME  sampled  streams  is  that  if  for  any  reason  the 
data  is  not  consumed  and  another  piece  of  new  data  arrives,  in  the  former  it  will  raise  a 
buffer  overflow  exception,  while  in  the  latter  the  data  will  be  simply  overwritten. 

E.  DEALING  WITH  SPECIAL  CASES 

Data  flow  streams  are  currently  implemented  in  CAPS  as  a  FIFO  queue  of  buffer 
size  one.  This  imposes  an  in^rtant  restriction  on  the  PSDL  program,  that  is,  all 
producers  of  data  flow  streams  to  some  unique  consumer  should  have  the  same  period,  or 
a  FIFO  buffer  overflow  may  occur  in  one  of  the  streams,  even  if  the  condition 
PERproducer  >  P^Rconsumer  ^  satisfied  (Figure  4.2).  This  happens  because  OPl  may 
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write  twice  before  0P2  outputs  some  value  so  that  the  triggering  condition  can  be 
satisfied.  This  problem  usually  reflects  a  possible  design  error,  because  it  makes  no  sense 
to  have  an  operator  being  triggered  simultaneously  by  two  data  events  that  are  produced 
with  different  rates.  A  possible  and  recommended  solution  is  to  force  all  producers  of 
data  flow  streams  to  a  unique  consumer  to  have  the  same  period. 


Figure  4.2.  Producers  with  Different  Periods 


Another  important  issue  is  that,  although  it  is  semantically  correct  in  PSDL  to  have 
several  operators  writing  to  the  same  data  flow  stream,  or  even  to  the  same  TRIGGERED 
BY  SOME  sampled  stream,  as  illustrated  in  Figure  4.3,  this  case  cannot  be  handled  unless 
an  upper-bound  is  placed  on  the  number  of  concurrent  copies  of  a  stream  in  a  PSDL 
program.  This  restriction  is  due  to  the  fact  that  streams  have  limited  buffer  size,  and  if  the 
number  of  copies  is  very  large  there  is  no  way  to  guarantee  that  one  operator  will  not 
write  to  the  stream  right  after  the  other,  and  therefore  cause  an  overflow.  In  the 
uniprocessor  case,  the  only  way  to  handle  this  problem  is  by  imposing  very  hard 
restrictions  on  the  period  of  the  consumers,  so  that  it  will  be  limited  to,  at  most,  half  of  the 
minimum  MET  of  the  producers.  This  result  may  be  seen  as  an  extrapolation  to  this  case 
of  Nyquist’s  well  known  sanpling  period  theorem.  Currently,  CAPS  does  not  enforce  this 
condition. 
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Figure  4.3.  Potential  Overflow  Situation 


Still,  due  to  the  powerful  semantics  of  PSDL,  there  is  another  problem  to  solve, 
which  is  the  possibility  of  the  same  stream  being  data  flow  for  some  consumers  and 
sampled  stream  for  others,  as  iUustrated  in  Figure  4.4.  To  make  things  worse,  these 
streams  can  even  have  different  latencies. 


Figure  4.4.  Different  Stream  Types  Combination 


Actually,  there  are  some  other  cases  that  could  also  be  cleverly  checked,  so  that 
users  could  receive  some  suggestions  and  warnings  about  their  design,  like  for  example  in 
the  case  illustrated  in  Figure  4.5,  where  OPi  could  have  its  period  increased  and 
consequently  lowering  the  load  factor,  since  it  will  not  do  any  good  to  keep  its  period 
smaller  than  OPj. 
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Figure  4.5.  Period  Incompadhility  among  Operators 

As  one  can  expect,  the  above  cases  make  the  validation  process  of  a  PSDL 
program  very  complex.  For  the  sake  of  completeness,  the  semantic  checks  and  scream 
type  derivations  for  all  possible  combinations  of  operator  types  and  data  triggering 
conditions  in  PSDL  are  listed  in  Table  4.1.  The  actions  which  should  be  taken  by  the 
scheduler  for  each  one  of  those  possible  combinations  will  also  be  presented. 
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Table  4.1.  PSDL  Data  Triggering  Semantic  Table 


TC  •Taii»Oiacal  Opemor 
NTC  •  NahTsB^OaiicaJ  Opentor 


LEGEND 


P  a  f^DObc  Openuo/I^Mid 
S  •  Spondbte  OpexMor 


SS  •  Saznptfld  Soeam 
DF  ■  D«a  FVov  SoeatD 


In  Table  4.1,  "upper"  and  "lower"  represent,  respectively,  the  maximum  and  the 
minimum  values  the  equivalent  period  of  the  sporadic  operator  can  assume.  They  are 
initially  set,  respectively,  to  infinite  and  zero.  "Actual"  is  the  value  of  the  triggering  period 
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of  the  sporadic  operator  after  the  conversion  is  done.  As  can  be  seen  in  Table  4.1,  in  all 
TRIGGERED  BY  ALL  cases  it  is  necessary  to  prevent,  or  at  least  give  warnings, 
whenever  the  producer  operator  is  faster  than  the  consumer,  so  that  no  loss  of  data  or 
overflow  will  be  incurred  [Table  4.1(1)].  Similarly,  in  the  TRIGGERED  BY  SOME 
cases,  this  constraint  must  also  be  enforced,  but  in  this  case  the  motivation  is  to  prevent 
loss  of  data,  since  Sampled  Streams,  by  definition,  do  not  overflow  [Table  4.1(2)]. 

When  dealing  with  ^radic  operators  upper  and  lower  bounds  are  defined  for 
their  triggering  periods,  so  that  later,  when  conversion  of  the  sporadic  operators  to 
equivalent  periodic  operators  takes  place,  it  is  certain  that  all  of  these  constraints  are  taken 
into  consideration  [see  Table  4.1(3)].  The  sporadic  to  sporadic  case  (S-S)  cannot  yet  be 
handled  with  upper  and  lower  bounds,  since  there  can  be  up  to  five  different  possible 
overlapping  patterns  for  their  period  interval.  Hence,  final  checking  of  this  case  will  be 
delayed  until  the  equivalent  periods  have  been  calculated  [Table  4.1(4)]. 

Another  important  point  to  mention  is  that  consumers  with  no  data  triggering 
condition  must  be  periodic,  or  an  error  will  be  raised  [Table  4.1(5)]. 

Finally,  although  very  unlikely  to  happen,  it  should  be  pointed  out  that  it  may 
happen,  for  unexpected  reasons,  such  as  a  lot  of  slack  time  left  over  from  the  static 
scheduler,  that  some  non-time-critical  operator  may  be  fired  more  than  once  in  the  same 
Harmonic  Block,  leading  to  a  possible  overflow  if  they  are  connected  by  data  flow  streams 
to  time-critical  operators  [Table  4.1(6)].  This  is  not  a  concern  among  NTCs,  since  all  of 
them  will  be  executed  consecutively,  in  other  words,  between  two  consecutive  instances 
of  any  NTC  operator  is  guaranteed  to  have  an  instance  of  all  the  remaining  ones  [Table 
4.1(7)]. 

Table  4.2  presents  all  possible  combinations  of  the  PSDL  timing  constraints  and 
the  resulting  actions  and  checks  to  be  performed  by  the  scheduler. 
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Table  4.2.  PSDL  Timing  Constraints  Semantic  Table 

LEGEND 
N  B  Noc  Supplied 
Sb  Supplied 

Table  4.2  shows  that  very  few  combinations  of  PSDL  timing  constraints  are 
semantically  acceptable.  The  only  one  that  deserves  some  explanation  is  the  case  where 
only  the  MET  is  supplied.  In  this  case,  the  scheduler  picks  up  a  pair  of  values  for  MCP 
and  MRT,  so  that  the  individual  load  factor  of  the  sporadic  operator  is  equal  to 

max((0.75  ~  5^  LFpg|^ ),  0.1) 

#  of  sporadic  operators 
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This  approach  relieves  the  designer  from  having  to  define  timing  constraints  for 
sporadic  operators,  which  might  not  be  clear  yet,  at  that  stage  of  the  prototyping,  and  it 
also  tries  to  decrease  the  timing  requirements  for  that  sporadic  operator.  However,  it  is 
dangerous,  in  the  sense  that  it  will  always  increase  the  load  factor  of  the  prototype  to  at 
least  0.75,  even  if  the  total  load  factor  for  all  periodic  operators  was  very  low. 

As  is  apparent,  most  of  the  semantic  checks,  mainly  those  related  to  the  control 
constraints  part  of  the  PSDL  program,  such  as  data  triggering  checks  and  liming 
constraints  checks,  are  left  up  to  the  scheduler  to  irrqilement  It  is  proposed  that  in  the 
future  CAPS  releases  some  of  these  checks  arc  taken  fiom  the  scheduler  and  inserted  into 
the  Syntax  Directed  Editor  (SDE),  so  that  the  user  is  not  allowed  to  proceed  to  the 
translation  step  until  he  has  a  vaUd  PSDL  program.  In  doing  so.  the  designer  will  not  have 
to  come  all  the  way  back  to  SDE  if  a  semantic  error  is  found. 

F.  TACKLING  THE  SYNCHRONIZATION  PROBLEM 

It  is  clear  that  the  most  irrqxrrtant  issues  in  dealing  with  synchronization  are  the 
periods  of  producer  and  consumer  tasks.  However,  even  in  the  uiuprocessor  case,  with  the 
period  of  the  consumer  being  smaller  than  the  period  of  the  producer,  it  can  be  easily 
shown  that  the  synchronization  is  not  always  a  good  alternative.  Hgure  4.6  shows  an 
cxan5)le  where  no  feasible  schedule  exist  if  synchronization  is  enforced,  but  it  does  exist 
otherwise.  Three  outcomes  arc  possible  if  the  synchronization  is  not  required.  First,  if  the 
consumer  operator  is  TRIGGERED  BY  ALL  X.Y  ,  the  proposed  schedule  is  valid  but  X 
and  Y  will  be  consumed  one  instance  later.  If  it  is  TRIGGERED  BY  SOME  X,Y  ,  then 
the  schedule  is  always  valid,  because  X  and  Y  do  not  need  to  be  consumed  together. 
Rnally,  if  there  is  no  trigger,  then  the  relative  order  is  not  important  anyhow. 
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Figure  4.6.  Reason  for  No  Synch  when  PERpnd  ^  PERcom  (Uniprocessor  Case) 


From  another  perspective,  if  PER^  ,  <  PER . .  then  the  streams  connecting 

them  should  be  sampled  streams,  because  otherwise  the  data  flow  streams  would 
overflow.  Since  the  loss  of  data  is  possibleCpossible"  because  the  data  might  well  not  be 
produced  at  all)  the  consumer  cannot  be  TRIGGERED  BY  SOME  either. 

The  only  case  in  which  PER  .  <  PER  can  be  allowed  is  when  there  is  no 

trigger  at  all.  In  this  situation,  synchronization  is  not  needed,  since  it  would  place  one 
additional  burden  on  the  scheduler,  and  would  not  solve  the  problem  of  loosing  data.  The 
only  advantage  to  having  synchronization  points  in  this  case  is  the  fact  that  there  would  be 
a  fixed  pattern  for  losing  data.  Furthermore,  by  not  having  explicit  synchronization,  the 
most  that  could  happen  is  that  the  consumer  operator  would  read  either  the  previous  or 
the  next  instance  of  the  data  output  by  the  producer,  in  other  words,  at  most  one  producer 
period  apart 
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Periodic  Periodic 


(70;200,70)  (70^00200) 


Rgure  4.7.  Reason  for  No  Synch  when  PERproj  <  PERecm  (Distr.  Case) 


The  second  possitriliQ^  is  PER^  ,,,  ,  ^  ^  case,  the  synchronizadon 

also  does  not  solve  the  problem,  since  it  is  possible  to  have  two  instances  of  the  producer 
operator  being  scheduled,  one  after  the  other,  causing  overflow  or  loss  of  data  depending 
on  the  triggering  condition.  This  case  is  illustrated  in  Figure  4.8. 


At  first,  one  may  conjecture  that  no  synchronization  is  needed  when  PER^ _ _ _ ^ 

PER  since  it  would  be  possible  to  catch  every  single  occurrence  of  data  ever 
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produced.  However,  this  conjecture  is  untrue,  due  to  the  fact  that  the  periodic  input  is  not 
periodic  in  the  common  sense  that  is  understood  in  electrical  engineering  and  other  related 
fields,  as  a  pulse  that  occurs  every  t  units  of  time! 


Rguic  4.9.  Synchronization  among  Periodic  Operators  when  FWa  =  METa 

If  that  was  so,  the  period  ratio  among  producer  and  consumer  would  be  a 
necessary  and  sufficient  condition  for  guaranteeing  synchronization,  according  to  the 
following  argument: 

Assuming  that  PERg  <  PERy^  (Eq.  (1))  and  that  the  phase  of  operator  A  is  zero, 
there  could  be  two  cases: 

1st  cose,  start  of  second  instance  of  B  is  less  than  finish  of  second  instance  of  A 

S2B<*2A-  Eq.(2) 

In  this  case  B  just  lost  A,  and  therefore  it  is  necessary  to  prove  that  the  third 
instance  of  B  will  certainly  catch  the  second  instance  of  A.  Formally 

S3B<f3A 

By  the  definition  of  periodic  operator,  and  from  Eq.  (1), 
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Eq.(5) 


Plugging  equations  (1)  and  (2)  into  (3), 

S3B  <  f2A  +  PERa 

Finally,  combining  (4)  and  (5), 

S3B  <  fsA  □ 

2nd.  case:  >  4a*  case  where  the  second  instance  of  B  will  catch  the 

second  instance  of  A.  □ 

In  general,  Sj^  <  s^  inplies  <  s^j^  and  hence,  neither  loss  of  data  or  buffer 
overflow  can  happen. 

However,  as  explained  before,  this  periodic  definition  is  slightly  different,  in  the 
sense  that  it  may  occur  anywhere  inside  the  period  slot,  invalidating  our  previous 
argument 

Within  this  framework,  things  are  made  much  more  complex,  and  the 
synchronization  approach  needs  u>  change  considerably. 

The  key  question  to  be  answered  is:  What  is  the  real  need  for  synchronization 
between  two  operators,  and  when  is  it  applicable?  As  shown  in  the  previous  examples, 
the  synchroiuzation  is  not  solving  the  problem  and  it  is  placing  an  additional  burden  on  the 
scheduler. 

Other  question  to  be  asked  is: 

Can  every  single  piece  of  data  coming  from  both  data  flow  streams  and  from 
TRIGGERED  BY  SOME  sampled  streams  be  guaranteed  to  be  consumed  in  a  timely 
fashion,  so  that  no  overflow  or  loss  of  data  occurs? 

The  answer  is  clearly  yes,  if  after  scheduling  each  producer  of  a  data  flow  or 
TRIGGERED  BY  SOME  sarr^Ie  stream,  the  consumer  of  that  data  flow  stream,  or  of 
that  sampled  stream,  is  scheduled  before  the  next  instance  of  the  producer. 

In  a  uiuprocessor  case,  or  even  in  a  shared  memory  multiprocessor  model,  this 
approach  is  acceptable  and  easy  to  implement  and  guarantee.  This,  by  the  way,  is  how  it 
is  in^lemented  right  now  in  CAPS.  However,  in  a  truly  distributed  case,  besides  the 
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difficulty  in  implementing  this  approach,  the  lack  of  a  master  clock  might  cause  a  feasible 
schedule  to  become  unfeasible.  This  assertion  may  be  illustrated  with  a  simple  example. 
Assume  a  schedule  for  a  two-processor  system  that  meets  all  deadlines  and 
synchronization  requirements  among  their  tasks,  and  that  no  buffer  overflow  occurs  with 
respect  to  the  data  flow  streams.  Now,  if  clock  drift  occurs  in  processor  2,  so  that  one  of 
its  consumers  gets  shifted  more  than  twice  the  period  of  its  correspondent  data  flow 
producer,  the  consumer  is  guaranteed  to  lose  data,  and  the  schedule  will  faiL 

Therefore,  although  the  uniprocessor  and  the  shared-memory  multiprocessor  cases 
can  be  handled  appropriately,  a  new  ^proach  must  be  developed  for  the  distributed  case. 
Ideally,  several  sets  of  communicating  processes  would  run  independently  in  each 
processor,  but  with  the  guarantee  that  no  data  would  be  lost  and  no  deadlines  missed. 

It  will  be  useful  to  review  the  synchronization  problem  between  producers  and 
consumers.  What  is  the  real  meaning  of  misang  a  deadline  within  the  context  of  a  real¬ 
time  system?  It  means  that  some  process  did  not  generate  its  output  within  the  specified 
amount  of  time,  and  therefore  the  consumer  could  not  consume  the  data,  and  so  on.  What 
is  important  here  is  that  missing  deadlines  are  always  attached  to  data  not  being  generated 
or  consumed  in  the  proper  timing,  and  this  is  going  to  be  the  key-point  in  the  approach, 
Lc.,  attempting  to  guarantee  that  all  data  being  generated  is  consumed  in  a  timely  fashion. 

Qcarly,  the  very  first  condition  that  must  be  satisfied  is  that  PERp,,,^^  ^ 
PER  so  that  no  data  is  lost  It  also  seems  obvious  at  first,  that  the  worst  case  that 

can  ever  happen  is  when  two  consecutive  instances  of  the  producer  are  fired  one  after  the 
other,  and  the  consumer  is  scheduled  about  two  periods  apart  Unfortunately  this  is  not 
true,  as  illustrated  the  following  Figure  4.9. 
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PRODUCER  A 


CONSUMER  B 

Rgure  4.10.  The  Consumer-Producer  Paradigm 

Figure  4.10  shows  that  even  with  a  faster  consumer  (PERb  ^  PERa)  one  cannot 
discard  the  possibility  of  having  more  than  one,  actually  even  three  occurrences  of  the 
slower  producer  between  two  consecutive  instances  of  the  consumer.  This  finding  raises 
the  following  additional  questions; 

1)  Under  what  conditions  could  that  happen? 

2)  Is  there  an  upper-bound  on  the  number  of  instances  of  producers  between  two 
consecutive  instances  of  the  consumer?  What  would  it  be? 

To  answer  these  questions,  analyze  carefully  Hgure  4.10. 

By  construction: 

PERa  +  2xMETa^2xPERb  Eq.  (1) 

and 

PERb  ^  PHRa  (Initial  Assumption) 

By  defirution  of  periodic  operator 

O^METaSPERa 

By  re-arranging  Eq.  (1) 

PERa 

METa^PERb - 
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To  answer  the  second  question,  let  us  assume  the  situation  presented  in  Figure 
4.11,  where  four  instances  of  the  producer  are  attempting  to  exist  in  between  the  same 
two  instances  of  the  consumer. 


Rgure  4.1 1.  Seeking  for  an  Upper-Bound 
Eq.  (1)  now  becomes 


2  X  PERa  +  2  X  METa  ^  2  X  PERb 
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Now  let  METa  =  0,  which  is  the  best  case  possible.  This  results  in  PERb  >  PERa. 
But  then  there  is  no  solution  for  the  set  of  inequalities,  i.e.,  three  is  actually  the  upper- 
bound. 

Based  on  these  results  the  following  lemmas  can  be  stated: 

Lemma  1: 

“Given  a  pair  of  operators,  where  one  is  a  producer  and  the  other  is  a  consumer, 
and  assuming  that  the  period  of  the  producer  is  bigger  than  the  period  of  the  consumer, 
there  can  exist  at  most  three  instances  of  produced  data  waiting  to  be  consumed  at  any 
instant  of  time”. 

Lemma  2: 

“Any  produced  data  will  be  consumed  within  at  most  two  periods  of  the 
consumer”. 

Finally,  these  lemmas  aUow  the  Fundamental  Synchronization  Theorem,  that 
will  be  most  useful  in  the  distributed  case,  but  that  can  be  tq)plied  as  well  in  the 
uniprocessor  case. 

Theorem  9: 

"If  there  exists  a  feasible  schedule  that  runs  without  buffer  overflow  or  loss  of 
in  a  shared  memory  multiprocessor  model,  then  there  can  be  a  distributed  and  totally 
independent  schedule,  without  any  kind  of  explicit  synchronization,  if  the  buffer  size  of  the 
data  flow  streams,  as  well  as  for  the  sampled  streams  with  a  triggered  some  condition 
have  a  size  of  three." 

1.  Additional  Restrictions  Imposed  on  the  Timing  Constraints 

Obviously,  a  price  is  paid  for  getting  rid  of  the  synchronization,  and  it  is  refleaed 
in  a  more  stringent  set  of  timing  constraints  for  tasks. 

Looking  back  at  Figure  4.10  it  can  be  seen  that  the  worst  case  that  can  happen  is 
to  have  some  data  from  a  producer  consumed  after  2  x  PERb  -  METb  units  of  time. 

Currently,  in  PSDL,  contrary  horn  the  sporadic  case,  there  is  no  upper-bound  on 
the  time  an  input  data  for  a  periodic  operator  should  be  consumed.  So,  if  the  consumer  is 
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a  periodic  operator  that  receives  data  from  network  streams,  the  fact  of  not  using 
synchronization,  will  not  impose  any  additional  constraints  on  their  timing  requirements. 

In  the  sporadic  case  however,  the  explicit  upper-bound  for  consuming  the 
incoming  data  is  its  MRT,  which  is  assumed  to  be  greater  than  or  equal  to  the  latency  plus 
the  MET  of  the  consumer  operator  for  the  incoming  data.  Therefore,  an  additional 
restriction  on  the  triggering  period  of  a  sporadic  operator  must  be  imposed  when  it  has 
any  data  coming  from  network  streams. 


Rgure  4.12.  New  Timing  Constraints  for  the  Sporadic  Operator 
FromHgure  4.12 

2  X  TPb  +  LATmax  ^  MRTb 


or 

_ MRTb  LATmax 

— 

which  is  the  new  upper-bound  for  the  triggering  period  of  a  sporadic  operator. 
From  Chapter  HI,  Section  E,  it  is  also  know  that  TP  ^  MET.  Hence, 
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METb  ^  TPb  ^  ° 

2  2 

which  is  the  new  formula  for  calculating  the  triggering  period  of  a  ^wradic  operator, 
under  the  no  synchronizadon  assumption. 

G.  THE  TASK  ALLOCATION  MODEL 

Two  basic  and  unavoidable  steps  when  designing  distributed  software  systems  are 
the  deconqwsition  of  the  system  functions  into  software  processes  during  the  early  stages 
of  the  design  and,  later  on,  the  allocation  of  these  processes  to  the  several  processors. 
Although  sometimes  these  two  steps  are  used  interchangeably,  tiiey  are  very  different 
activities. 

Given  the  software  requirements,  the  designer  must  first  identify  a  set  of  logical 
interrelated  modules  and  perform  its  functional  decomposition.  This  can  be  done  with  the 
aid  of  traditional  design  methods,  such  as  structured  and  object  oriented  design.  For  real¬ 
time  systems,  such  decomposition  will  require  consideration  of  critical  timing  constraints 
and  may  require  introduction  of  special  modules  for  synchronization  [SW89]. 

The  first  major  activity  is  partitioning,  which  is  the  mapping  of  these  logical 
modules  into  a  set  of  physical  processes.  The  second  is  aUocation  (sometimes  called 
assignment)  which  is  the  mapping  of  each  process  to  one  or  more  processors.  The  focus 
of  this  chapter  is  on  allocation;  for  further  reading  on  partitioning  see  Shatz  and  Wang 
[SW89]. 

As  shall  be  seen,  task  allocation  dramaticaUy  complicates  the  already  complex 
problem  of  distributed  software  design,  because  assigning  m  processes  onto  n  processors, 
there  arc  n“  different  possible  assignments.  Optimal  allocation  is  a  problem  of  exponential 
complexity,  and  it  was  proven  to  be  NP-complcte  by  Mok  [Mok83]. 

The  key  to  process  allocation  is  to  establish  an  aUocation  model  in  terms  of  a  cost 
function  and  additional  ccxistraints  that  match  the  application  requirements  as  far  as  logical 
and  timing  correctness.  The  goal  is  to  minimize  the  cost  function  under  the  constraints. 
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Most  of  the  cost  functions  found  in  available  literature  deal  with  performance.  Others, 
such  as  those  relating  to  reliability  and  fault- tolerance,  are  only  now  emerging  [SW89]. 

The  most  widely  used  performance  cost  functions  are: 

1)  Interprocessor  communication  cost  (IPC)  which  is  a  function  of  the  amount  of 
data  transferred,  the  network  topology  and  link  capacity; 

2)  Load  balancing,  which  is  a  measure  of  how  uniform  the  workload  among  the 
processors  is.  A  good  load  balancing  will  maximize  the  system  stabiliQr,  which 
is  the  capability  of  busy  hosts  to  receive  bursty  arrivals  of  processes  without 
con^nomising  their  deadlines. 

3)  Completion  time,  the  total  execution  time  including  interprocessor 
communication  incurred  by  that  processor. 

The  most  frequent  constraints  found  in  typical  real-time  systems  are  due  to 
hardware  limitations  of  some  processors,  dependence  of  some  processes  on  certain 
processors,  and  number  of  available  processors. 

The  choice  of  a  cost  function  obviously  depends  on  the  application,  on  the 
underlying  hardware,  and  on  several  other  characteristics. 

Although  distributed  processing  seems  very  attractive,  one  should  be  aware  of  the 
saturation  effect  (Figure  4,13)  that  is  sometimes  forgotten  by  many  developers.  The  basic 
consequence  of  this  effect  is  that,  contrary  to  expectations,  the  throughput  doesn’t 
increase  linearly  as  the  number  of  processors  is  increased.  Actually,  at  some  point  (which 
can  be  as  few  as  three  or  four  processors)  throughput  actually  starts  to  decrease. 
Examples  of  this  phenomenon  are  documented  C3m,  et  aL  [CHL80]  and  by  Jenny 
[Jcn77].  The  decrease  in  throughput  is  due  to  the  excessive  interprocessor 
communication,  which  is  similar  to  the  trashing  problem  in  the  eady  memory  paging 
systems. 
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THROUGHPUT 


#«r  PROCESSORS 

Figure  4.13.  The  Saturation  Effect 

Basically,  all  of  the  different  approaches  to  solve  the  allocation  problem  fall  into 
one  of  the  three  major  classification  areas:  graph  theoretic,  mathematical  programming,  or 
heuristic  methods,  which  are  by  no  means  mutually  exclusive. 

The  first  of  these  represents  the  processes  to  be  allocated  as  nodes  in  a  graph, 
where  each  edge  has  a  weight  that  is  proportional  to  its  inter-module  communication  cost 
(IMC),  with  the  following  remarks:  an  IMC  of  zero  means  that  no  conrmunication  takes 
place  between  those  two  iiKxiules  and  an  IMC  of  infinity  means  that  they  should  be 
assigned  to  the  same  processor.  If  a  minimal-cut  algorithm  is  performed  on  the  graph  one 
ends  up  with  the  minimum  allocation  cost  for  those  modules  between  two  processors.  In 
general,  however,  an  extension  of  this  method  to  an  arbitrary  number  of  processors 
re(]uires  an  n-dimensional  min-cut  flow  algorithm,  which  quickly  becomes 
computationally  intractable. 

The  mathematical  programming  approach  uses,  in  most  cases,  the  non-linear 
integer  programming  technique,  where  the  above  problem  is  formulated  as  a  set  of 
equations.  It  is  very  flexible  in  the  sense  that  additional  constraints  can  be  included  in  the 
model  veiy  easily,  however  it  has  two  short-comings.  F^st,  it  fails  to  accurately  represent 
real-time  constraints  and  precedence  relations  among  the  tasks,  because  both  factors 
introduce  queuing  delays  into  the  system  in  a  complex  manner  [DSWE83]. 
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Finally,  the  heuristic  methcxis,  unlike  the  first  two,  try  to  find  sub-optimal  solutions 
for  the  assignment  problem,  which  are  in  general  faster,  more  extendible  and  simpler. 

1.  Some  Basic  Definitions 


Defining  several  metrics  will  provide  a  better  insight  into  the  problem. 
Average  Task  MET  -  given  n  tasks,  it  is  a  lower-bound  in  the  response  time; 


METavg 


Imet 

n 


Average  lx>ad  Factor  -  it  is  a  kind  of  schedulability  index  that  shows  how  tight  the 
system  is.  The  bigger  it  is  the  harder  is  to  find  a  schedule.  It  is  independent  of  the  number 
of  processors,  e.g.,  LFavg  =  0.8  means  that  almost  eveiy  operator  is  very  CPU-intensive. 
A  more  precise  insight  could  be  obtained  by  the  standard  deviation  of  the  load  factor. 


LFtot 


MET 

PER 


LFavo  = 


LFtot 


Average  Processor  Load  Factor  -  given  the  number  of  processors  p,  it  specifies 
the  ideal  share  of  processing  so  that  a  perfect  load  balancing  is  achieved. 

P 

Maximum  Processor  Load  Factor  -  it  specifies  the  maxiraum  load  factor  each 
processor  can  sustain  using  the  mininiuin  number  of  processors, 

PIF 


Placement  Cost  Matrix  •  it  basically  shows  the  cost  incurred  when  operator  X  is 
allocated  to  processor  k.  If  some  task  must  be  placed  in  some  specific  processor,  its 
placement  cost  should  be  zero.  Otherwise  it  should  be  infinity.  Other  values  reflecting  the 
user  s  desires  can  also  be  used  so  that  the  scheduler  will  have  more  opdons  when  deciding 
upon  the  allocation. 
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iPiacem^t  Cost 

Processor  1 

Processor  2 

Processor  3 

Operator  A 

oo 

0 

4 

Operator  B 

0 

OO 

7 

Operator  C 

5 

8 

5 

Table  4.3.  Placement  Cost  Matrix 

Inter-Module  Communication  Cost  Matrix  -  it  basically  shows  the  cost  incurred 
when  operator  X  wants  to  communicate  with  operator  Y,  or  vice-versa,  using  the 
network.  Note  that  it  should  be  symmetric,  since  it  doesn’t  depend  on  the  way  the 
communication  is  carried  out.  It  simply  states  that  if  those  two  operators  are  allocated  in 
different  processors,  that  will  be  the  amount  of  communication  they  will  have  to  exchange. 
In  this  case  it  will  also  account  for  the  state  streams. 


Table  4.4.  IMC  Cost  Mattix 

Distance  Cost  Matrix  -  it  takes  into  account  the  geographic  distance  between 
prxxiessors.  For  all  distances  within  a  local  area  network,  index  1  is  assumed.  When  not 
connected,  the  distance  is  assumed  to  be  infinite.  If  passage  through  additional  networks 
is  required,  there  will  be  an  increase  of  0.1  for  each  additional  level  of  networking.  Note 
that  the  base  purpose  of  this  matrix  is  to  see  if  the  specified  latencies  and  network  delays 
are  compatible  with  the  underlying  hardware  architecture. 


Table  4.5.  Distance  Cost  Matrix 
*  Note  that  we  wfll  be  using  interchangeably  the  term  IMC  and  IPC. 


2. 


The  Approach 


The  first  attempt  was  to  separate  tasks  according  to  their  data  dependency,  which 
was  determined  by  calculating  the  several  slices  of  the  prototype.  Informally,  a  slice  is 
defined  as  the  set  of  possible  paths  from  a  sink  node  (nodes  with  no  output)  to  a  root  node 
(nodes  with  no  input  edges),  i.e.,  a  slice  contains  all  ancestors  of  a  sink  node.  For  a  formal 
definition  see  Dampier  (I>am94].  Qearly,  an  operator  can  belong  to  more  than  one  slice. 


Hgure  4.14.  The  Data  Dependency  Graph 


After  all  slices  are  calculated  the  operators  that  belong  to  the  same  slices  are 
grouped  into  equivalent  classes,  such  as  Ga,  Gab.  Gcde  etc.,  meaning  that  they  belong  to 
slice  A,  slices  A  and  B,  or  slices  (D,  D,  and  E,  respectively.  The  resulting  graph  is  the  Data 
Dependency  Graph,  which  is  shown  in  Rguie  4.14.  The  following  algorithm  can  then  be 
applied: 

1)  Pick  those  operators  that  belong  to  two  slices.  At  least  one  operator  must  exist 
in  this  equivalence  class  that  has  two  edges,  one  for  each  of  the  slices  it  belongs 
to.  Pick  the  least  expensive  edge,  i.e.,  the  one  with  the  least  IMC  cost,  and  add 
the  operator  to  this  group.  This  may  later  prove  to  be  something  less  than  the 
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best  choice,  but  for  now  it  is  the  best  option  available  without  resorting  to  the 
expensive  method  of  checking  the  entire  slice.  The  final  partition  is  illustrated 
by  the  dotted  line  in  Figure  4.14,  and  presents  a  cost  of  1 17  IMC  units.  To  get 
rid  of  this  problem,  instead  of  trying  to  join  in  a  bottom-up  fashion,  the  most 
expensive  edge  not  yet  included  in  any  group  may  be  added,  and  an  attempt  can 
be  made  to  unite  both  groups,  resulting  in  the  following  partition:  {Ga  ,  Gabd  , 
Gab),  {Gabc  ,  Gc),  {Gde  ♦  Go),  {Gb}  and  {Ge),  which  has  a  cost  of  56  IMC 
units; 

2)  Keep  doing  this  for  the  operators  belonging  to  three  slices,  four,  etc.,  until  all 
operators  have  been  processed. 

3)  If  the  load  factor  in  some  set  exceeds  one  or  some  specified  threshold  then  the 
set  should  be  split  into  two  by  recursively  ^rplying  the  two-dimensional 
minimal-cut  algorithm,  until  all  sets  have  a  load  factor  less  than  one.  Note  that 
since  the  min-cut  algorithm  is  trying  to  minimizfi  the  cost  of  the  edge,  it  may 
well  not  be  an  c^timal  choice  for  minimizing  load  factor.  Checking  for  load 
factor  is  left  until  the  end  because  the  relative  costs  of  those  edges  could  not  be 
determined  prior  to  completing  the  first  two  steps.. 

The  intended  result  was  to  have  several  fairly  data  independent  sub-graphs  that 
could  be  assigned  to  different  processors,  having  a  nrinimnm  IPC  cost,  and,  most 
importandy,  providing  a  veiy  nice  nKxlularization  for  the  system  with  direct  effects  on 
reliability.  For  example,  if  some  processor  had  a  problem,  only  those  modules  allocated  in 
that  processor  would  fail  Of  course,  this  approach  did  not  take  into  account  load 
balancing,  but  at  least  provided  a  starting  point 

Unfortunately,  after  running  a  partial  implementation  of  this  algorithm  with  several 
random  generated  prototypes,  its  computation  cost  proved  to  be  very  high  and  most  of  the 
prototypes  ended  up  having  very  few  slices  to  start  with. 
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After  analyzing  the  advantages  and  disadvantages  of  the  initial  attempt  and  several 
other  alternatives,  it  was  decided  to  use  the  inter-module  commumcation  cost  (IMC)  as 
the  main  cost  function,  without  taking  into  consideration  any  data  dependency. 

Now  it  is  necessary  to  come  up  with  a  consistent  way  of  assigning  the  IMC  cost  to 
each  pair  of  operators  in  a  PSDL  graph. 

aearly,  in  the  PSDL  context,  where  complex  ADTs  can  travel  through  the 
streams,  the  amount  of  data  transferred  by  a  stream  is  variable,  and  its  actual  size  can  only 
be  known  at  run-time  when  the  actual  prototype  is  executing.  Therefore  it  is  necessary  to 
use  some  kind  of  average  or  normalized  value,  so  that  the  deviations  are  diminished. 
Another  asRiimptinn  to  be  made  (it  is  actually  already  part  of  the  PSDL  model)  is  that 
every  operator,  when  fired,  ou^uts  one  and  only  one  value  per  firing  for  each  of  its  output 
streams.  Furthermore,  the  worst  case  is  assumed,  where,  once  activated,  the  operator  will 
always  produce  an  output,  even  if  the  data  triggering  conditions  or  the  output  guards  are 
not  satisfied. 

The  IMC  cost,  represented  as  IMC_INDEX,  and  the  actual  amount  of  data  to  be 
transmitted  between  two  operators,  denoted  as  IMC_PER_SEC,  are  calculated  according 
to  the  algorithm  described  in  Figure  4.15. 


for  each  pair  of  operators  loop 
if  parent  operator  is  TC  then 

IMC_PER_SEC  :=  CONNECnVITy  x  AVG_PROC_TME  x  1000  /  PERIOD_PRODUCER; 
elsif  parent  is  NTC  then 

IMC_PEIL.SEC  :=  CONNECTIVITY  x  AVG_PROC_TIME  x  1000  /  HARMONIC.BLOCK: 

end  if;  _ 

IMCJNDEX  :=  IMC_PER_SEC  /NORMALETD.lJOAD.FACTOR 
_ end  loop; 

Figure  4.15.  Algorithm  for  Calculating  the  IMC  Cost  Function 

Note  that  in  order  to  quantify  and  compare  IMCs  it  was  necessary  to  fix  the  time 
window  for  measurement  and  the  second  was  chosen. 

AVG_PROC_TIME  is  the  estimated  average  time  in  microseconds  taken  for  that 
system  to  output  a  typical  PSDL  stream  to  some  buffer,  which  will  be  later  transmitted  to 
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the  network.  Note  that  this  parameter  is  innocuous,  since  it  is  a  constant  for  every  stream. 
The  only  reason  to  maintain  the  parameter  is  to  make  the  resulting  index  more  realistic. 

CONNECTIVITY  is  defined  as  the  number  of  streams  connecting  two  operators 

including  the  state  streams. 

The  ratio  1000  ms/  PERIOD  (ms)  for  the  time-critical  operator  specifies  the 
number  of  periods  that  occurs  in  one  second,  that  is,  the  number  of  times  the  producer  will 
fire.  For  the  non-time-critical  operator  the  HARMONIC  BLOCK  (HB)  is  used  as  if  there 
was  only  one  occurrence  of  the  NTC  operator  in  each  HB. 

Finally,  for  the  IMC_INDEX  the  NORMALIZED_LOAD_FACTOR  is 

introduced,  defined  as: 

(IjOAD  factor  parent  +  LOAD_FACTOR  CHILD)  /  MAX_LF_PER_PROC 
Note  that  the  above  formula  is  valid  for  any  case  except  when  both  operators  are 

NTCs.  In  this  case  the  formula  is  changed  to: 

((1.0  -  MAX_LF_PER_PROQ  +  (1.0  -  MAX_LF_PER_PROQ)  /  MAX_LF_PER_PROC 

or 

(2.0/  MAX_LF_PER_PROQ  -  2.0 

The  rational  behind  these  formulas  is  that  if  there  are  two  small  LF  operators 
connected  by  a  stream  with  some  IMC.PER.SEC,  the  IMC.INDEX  or,  rather,  the 
relative  cost  for  placing  them  in  different  processors  should  be  much  higher  than  if  they 
had  big  load  factors,  for  a  same  IMC_PER_SEC  value.  For  streams  connecting  two  NTC 
operators  that  don’t  have  an  explicit  load  factor,  since  they  don’t  have  periods  nor  METs, 
the  remaining  load  factor  will  be  used.  In  other  words,  1.0  -  TOTALJLF,  as  if  it  was  the 
load  factor.  If  the  load  factor  is  bigger  than  one,  then  there  must  be  more  than  one 
processor,  so  that  the  maximum  average  load  factor  per  processor  is  used  instead, 
assuming  that  the  minimum  number  of  processors  is  available. 

Although  it  is  not  used  in  the  current  implementation,  it  seems  to  be  a  good  idea  to 
divide  the  remaining  LF  among  all  NTCs  operators.  This  way  it  would  be  less  costly  to 
split  two  NTC^s,  where  the  total  load  factor  of  the  prototype  is  0.8,  than  to  split  two  TC 
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operators  both  with  load  factors  0.2.  In  the  current  implementation,  both  cases  have  the 
same  cost. 

3.  The  Current  Implementation 

As  the  very  first  stqj,  the  allocation  algorithm  builds  a  priority  queue  of  edges  in 
decreasing  order  of  inter-module  communication  cost  (IMC_INDEX),  which  were 
previously  calculated.  Note  that  it  will  contain  all  edges  in  the  prototype  and  not  rally 
those  connecting  time-critical  operators. 

Once  the  priority  queue  exists,  each  operator  is  allowed  to  form  a  set  by  itself. 
Next  a  union-find  algorithm  is  applied,  so  that  if  the  origin  and  destination  operators  of  the 
edge  being  examined  belong  to  different  sets,  they  are  united  (as  long  as  their  combined 
load  factor  is  still  under  some  threshold  previously  established  by  the  user). 

begin  ~  allocate 

—  Build  a  priority  queue  of  edges  in  decreasing  order  of  IMC_INDEX 
BUILD_PRI_QUEUE(OOUNT); 

—  Let  each  operator  form  a  distinct  set  by  itself, 
for  I  in  l.J^EW_GRAPH_PKGARRAY_SIZE  loop 
OP  :=  NEW_GRAPH_PKGJlErURN_OP(I); 
OP_UNION_FIND_PKG.CREATE(OP_LlNK(I).OP); 
end  loop; 

while  IMC_PRIORnT_QUEUE.NON_EMPTY(PRI_QUEUE)  loop 
EDGE  :=  IMC_PR10RITY_QUEUEJIEAD_BEST(PR1_QUEIJE); 

ROOT.A  :=  OP_UMON_FIND_PKGJ=IND(OP_LINK~(EDGE.ORIGIN)); 

ROOT_B  :=  OP_UNION_FIND_PKGJTND(OP_LINK  (EDGEDEST)); 
if  not  OP_UNION_FIND_PKG.eq  (ROOT.A,  ROOT_B)  then 

if  ROOT.AXF  +  ROOT_B  5  ALLOCATION.FACTOR  then 

ROOT_C  ;=  OP_UNION_FIND_PKG.UNION(ROOT_A.  ROOT_B. 

ALLOCATIOn1faCTOr3; 

end  if; 
end  if; 

IMC_PRIORITY_QUEUEREMOVE_BEST(PRl_QUEUE); 
end  loop; 

end  allocate; 

Figure  4.16.  Partial  View  of  the  Allocation  Program 

As  can  be  seen,  the  current  approach  is  a  kind  of  first-fit  bin-packing,  where  the 
size  of  the  bin  is  dictated  by  the  ALLOCATION  FACTOR  specified  by  the  user.  A  very 


simple  modification  which  would  allow  a  better  load  balancing  is  to  substitute  the 
ALLOCATION  FACTOR  by  the  AVERAGE  PROCESSOR  LOAD  FACTOR  of  the 
prototype,  multiplied  by  some  number,  for  example,  1.1,  to  allow  some  variation  around 
the  avCTage.  In  doing  this,  it  is  being  enforced  that  all  processors  will  get  an  even  load, 
despite  of  an  increase  in  the  communication  cost.  Other  checks  could  be  applied  as  well, 
such  as  checking  the  requirements  or  the  placement  cost  matrix  to  see  if  the  operators 
could  be  allocated  to  the  same  processor,  or  if  tiiey  needed  to  be  in  a  specific  processor. 
The  slices  they  belong  to  could  also  be  examined,  so  that  even  if  the  load  balancing  rule  is 
not  coaq)letely  satisfied  they  could  still  be  assigned  to  the  same  processor  if  they  were  in 
the  same  slice.  As  can  be  seen,  there  are  an  enormous  number  of  possibilities  for  cost 
functions.  However,  finding  the  one  that  best  fits  the  application  requires  a  great  deal  of 
fine  tuning. 

The  union-find  data  structure  has  been  implemented  as  an  in-tree,  where  the  nodes 
can  have  many  children,  therefore,  after  all  the  sets  have  been  formed,  we  need  an  0(n^) 
worst  case  algorithm  in  order  to  retrieve  their  members.  Another  way  to  implement  it  that 
would  make  the  retrieve  operation  much  cheapo*  is  by  using  a  double  linked  list,  but  then 
the  insert  operation  would  be  a  little  bit  more  expensive.  In  both  cases,  the  uition-find 
algorithm  could  be  enhanced  by  adding  path  compression  and  balancing  into  the 
implementation,  resulting  in  an  CXmlog  n)  time  algorithm,  where  m  is  the  number  of  edges 
in  the  graph. 

HnaDy,  the  allocation  algorithm  outputs  a  set  of  sets,  i.e.,  a  set  where  each  of  the 
components  is  another  set  containing  the  nodes  in  that  partition.  Although  not  included  in 
the  current  implementation,  it  should  ultimately  output  a  map  instead  of  a  set,  where  each 
of  the  partitions  would  be  mapped  to  a  specific  processor,  according  to  the  requirements. 
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V.  ARCHITECTURAL  ISSUES  OF  THE  CAPS  SCHEDULER 


Section  A  of  this  chapter  describes  several  issues  related  to  the  architecture  of  the 
CAPS  scheduler  in  its  current  uniprocessor  implementation.  Section  B  presents  a  novel 
architecture  for  dealing  with  the  distributed  scheduling  case.  The  remaining  sections  of 
this  chaptet  contain  a  proposed  ia^lementation,  first  using  the  current  available 
technology  and  then  using  the  upctmting  facilities  offered  by  Ada95.  It  is  important  to 
note,  however,  that  while  implementing  the  distributed  system  in  Ada  provides  a  uniform 
environment  for  building  prototypes,  it  suffers  fiom  the  disadvantage  that  tasking  and  the 
new  distributed  systems  support  in  Ada95  are  not  time-bounded.  Hence,  in  order  for  the 
distributed  Ada  prototype  to  satisfy  the  timing  constraints  as  specified,  the  average 
behavior  of  the  underlying  host  operating  system  and  the  network  communication  sub¬ 
system  must  be  relied  upon. 

A.  THE  CURRENT  SCHEDULER  -  UNIPROCESSOR  ARCHITECTURE 

Currently,  CAPS  is  a  development  environment,  in^lemented  in  the  form  of  a 
collection  of  tools,  that  are  linked  together  by  a  user  interface.  The  prototyping  process  is 
accomplished  by  running  several  tools  independently,  one  after  the  other,  so  that  their 
output  taken  together  make  up  the  final  Ada  program,  which  will  iiiq)leinent  the 
supervisory  control  of  the  prototype. 

More  specifically,  the  translator  converts  the  PSDL  program  defined  by  the  user 
into  compilable  Ada  units.  During  this  process,  it  creates  the  following  five  major 
packages:  exceptions,  instantiations,  timers,  streams,  and  drivers,  all  preceded  by  the  name 
of  the  prototype  followed  by  an  underscore.  Ultimately  each  of  these  will  become  pan  of 
the  prototype  supervisory  Ada  program. 

The  first  three  of  these  packages  contain  all  of  the  user  declared  exceptions, 
generic  packages  and  timer  instantiations  defined  in  the  PSDL  program.  The  package 
streams  contains  the  instantiations  of  all  the  streams  used  by  the  prototype,  which  are 
implemented  as  Ada  generic  tasks  contained  in  the  generic  package  PSDL_STREAMS, 
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which  contains  all  stream  types  supported  by  PSDL.  A  partial  view  of  the  supervisory 
program  for  the  Patriot  Missile  prototype  is  shown  in  Figtue  5.1. 

package  PATRIOT.EXCEPTIONS  is 
~  PSDL  excqstion  type  declaration 

type  PSDL_EXCEPnON  is  (U>roECLARED_ADA^EXCEPTION); 
end  PATRIOT^EXCEFnONS; 

package  PATRIOT.INSTANTIATIONS  is 
—  Ada  Generic  package  instantiations 
end  PATOIOT.INSTANTIATIONS; 

withPSDL.TlMERS; 
package  PAHUOT.HMERS  is 
Timer  instantiations 
end  PATRIOT^TIMERS; 


—  with/use  clauses  for  atomic  type  packages 

—  with/use  clauses  for  generated  packages. 

with  PATRIOT.EXCEPTIONS;  use  PATRIOT.EXCEPTIONS; 
with  PATRIOT.INSTANTIATIONS;  use  PATWOT.INSTANTIATTONS; 

with/use  clauses  for  CAPS  library  packages, 
with  PSDL.STREAMS;  use  PSDL.STREAMS; 
package  PATRIOT.STREAMS  is 

—  Local  stream  instantiations 

package  DS JNTERCEPT.ANGLE.CONTROL.PATTUOT  is  new 
PSDL.ST11EAMSJTF03UFFER(FL0AT); 
package  DS.LAUNCH.ANGLE.LAUNCH.PATRIOT  is  new 
PSDL.STTlEAMSJTFOJUFFER(FLOAT); 
package  DS.LAUNCH.STATUS.SCUD.RADAR  is  new 
PSDL_STREAMS.SAMPLED.BUFFER(LAUNCH.STATUS.RECORD); 
package  DS.LAUNCH.STATUS.DISPLAY.SCUD  is  new 
PSDL.STTlEAMS.SAMPLEDJUFFER(LAUNCH.STATUS.RECORD); 
package  DS.LAUNCHER.POSITTON.SCUD.RADAR  is  new 
PSDL.STTlEAMS.SAMPLED.BUFFER(FLOAT^; 
package  DS  J4ISSILE.TRACK.CHECK.THREAT  is  new 
PSDL.STTyEAMS.SAMPLED.BUFFER(TlL\CK); 
package  DS.SCUD.STATUS.DISPLAY.SCUD  is  new 
PSDL.STKEAMS.SAMPLED_BUFFER(MISSILE.STATUS); 
package  DS.SCUD.TRACK.DISPLAY.SCUD  is  new 
PSDL.STTIEAMS.SAMPLED.BUFFER(TTIACK); 
package  DS.TACnCAL.STATUS.DISPLAY.TACTICAL  is  new 
PSD1^STTIEAMS.SAMPLED.BUFFER(MISSILE.STATUS.REC0RD); 
package  DS.TARGET.RANGE.CONTROL.PATRIOT  is  new 
PSDL_STREAMSJTFOJUFFER(FLOAT); 

—  State  stream  instantiations 
end  PATRIOT.STREAMS; 


Figure  5.1.  Partial  View  of  PatrioLa 
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Currendy,  CAPS  implementation  supports  only  the  sampled  streams  where  data 
can  always  be  written  and  read,  the  state  streams,  which  are  basically  a  sampled  stream 
with  an  initial  value,  and  the  data  flow  streams,  which  are  implemented  as  a  FIFO  buffer 
with  gjye  one.  The  streams  are  implemented  as  individual  Ada  tasks  with  entries  such  as 
READ,  WRITE  and  CHECK,  whose  in^ilementation  will  vary  according  to  the  type  of 
stream. 

HnaUy,  the  package  drivers  basically  ccHitains  all  of  the  data  declarations,  the  data 
trigger  checks  that  control  whether  a  stream  should  or  should  not  be  read,  the  execution 
trigger  checks  that  decide  whether  or  not  to  fire  the  operator,  and  the  output  guard 
checks,  which  will  allow  whether  or  not  an  output  is  to  be  written  to  the  output  streams. 
Each  of  these  checks  are  implemented  in  die  following  way: 

1.  Data  Triggers 

If  an  operator  has  no  triggering  condition  at  all,  its  input  streams  will  be  read 
whenever  the  operator  is  fired,  but  they  will  never  generate  any  overflow  or  underflow 
exceptions.  Similar  situation  happens  when  the  sueams  are  state  streams. 

If  at  least  one  of  the  incoming  streams  is  a  TRIGGERED  BY  SOME  sampled 
stream,  then  the  streams  will  be  read  whenever  one  or  more  of  the  streams  in  the 
TRIGGERED  BY  SOME  set  has  new  data,  but  again,  they  will  never  generate  an 
underflow  exception.  Because  of  this,  care  must  be  taken  with  respect  to  the  very  first 
reading  of  data  fiom  sampled  streams,  since  garbage  may  be  consumed. 
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OPERATOR  tiif  jfewad  by  «arrm 
SPEOnCATION 
END 

IMPLEMENTATION  GRAPH 
DATA  STREAM 
•Ititads :  flcMU, 
iH|p :  float, 

INTEGER 

CONTROL  CONSTRAINTS 
OPERATOR  Update jnck  TRIGGERED  BY  SOME 
idtitods^nfo 
OPERATOR  Rackr 
OPERATOR  1r_Tfarihafr 
OPERATOR  AbinM 
END 


pracaflgn  UPDATBJIRACXJDRIVER  If 
LV JU-TTIUDE ;  FLOAT; 

LVJIANGB:  FLOAT; 

LVJTARGKT JD ;  INTECSR; 

EXCEPTION JiA5_OOCURRED:  BOOLEAN  :>  FALSE; 

EXCEPnQN_]D:  PSDLJBXCEmON; 
bagiB 

Data  Biggor  dncki. 

if  art  (DS_ALTmJDE_UFPATE_TRACICNEWJ>ATA  <r  atoe  DS_RANCffi_UPDATB_TRACKJ'JEWJ)ATA)tiMj 

irtum; 

aiaii^ 

-Dataamanada. 


DS^TTnJDE^UIOATEjnUCK3UTORJlEAD<LVja,TTrUDB); 

noaptlMi 

whta  BUPPER^UNDERFLOW  ^ 

^  DSJ»BU03UPPBR^UNDeRn/>WrALTTnJ13e_UPDATE.TlLUX-,TJIOA 

DSJlANCffi_UPDATB_TRACILBUPPBRJffiAIXLVJUN®); 

ticapUBB 

vbaa  BUPPER^UNI»RPLOW -o 

I»J»BUOJUFreiL.UNDBRPLOWrRANCffi_UPDATE_TRACE-, -UPDATE  TRACK-): 

aad; 

bagte 

OSjrARaET_n>_UPDATB_TRACILBUPPERJRBAD(LV  TARGET  IDk 
wbaa  BUPPER^UNDERPLOW  «o 

^  “J»BlW-BUPPBR^UNDBRPL0W(^ARCKrjD.UPDATB_TRACK-.-UPDATE_T10SCIC-); 


••d  UPDATEjrRAdUQRIVBR; 


Figure  5.2.  TRIGGERED  BY  SOME  Implementation 


If  at  least  one  of  the  incoming  streams  is  a  data  flow  stream,  in  other  words,  has  a 
TRIGGERED  BY  ALL  condition,  the  streams  will  only  be  read  if  the  data  flow  stream 
has  a  new  value  in  its  buffer,  and  any  attempt  to  read  an  old  value  from  a  data  flow 
stream,  will  generate  an  underflow  exception.  As  shown  in  Figures  5.2  and  5.3,  the  read 

operation  is  actuaUy  a  call  to  rendezvous  with  the  READ  entry  of  the  incoming  stream 
task. 
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OPERATOR  trigrmlJiy.aU 
SPECIFICATION 
END 


pneaian  0VEN^00NTR01^_DRIVER  U 
LV jUJOtM :  BOCX£AN; 
LV.OOMMAND :  INIECSR; 

LV JIBMPERATTJRB :  HjO AT; 


IMPLEMENTATION  GRAPH 
DATA  STREAM 
AUnn !  Bootom, 
oanmiid :  INTEGSR, 
iDD^mtaR:  FLOAT 
OOVTROL  CONSTRAiyrS 
OPERATOR  OwL-Omral  TRIGGERED  BY  ALL 

OPERATOR  Taq>_Alaan 
OPERATOR  Teiq)_Scofar 
OPERATOR  lzi{nt_K0ypad 
END 


EXCEFnONJUS^OOCURRED:  BOOLEAN :«  FALSE; 

EXCEPTION JD:  F5DL.BXCXPnON; 

htgltt 

•>  Daturicier  dBcki. 

ff  MC  (DS.TBMPERATURB^OVBN.OONTROLJ^J) AT  A  u4  Umb 
DS jaj«M_OVEN_CONTOOLNBW J> ATA)  Umb 


tMilfi 

-DwtasiiMmiBadt. 

bt|lD 

DSJUJUIM.OVEN.OON1ROL3UFI«LR£AIXLV^ALARM); 

OCCpUOB 

wbtB  buffeilunderflow  ^ 

DS  J)EBUO  JUFPER^UNDBRFLOW(‘ALARM_OVBN.OONTROL*.  “OVEN.CONTROL"); 


DS^OOMMAND.OVEN.OONIROUBUPreRJlBADCLV^OOMMAND); 

■■cfpttoB 

wiMB  BUPPER^UNDERFLOW  ■> 

DSJ>EBUG.BUFPER.UNDERFLOWrOOMMAND_OVEN.OONTROL', 

“OVEN^CONIROL*); 


DSJIEMPERATURfi^OVEN_OQf41ROLRUPPERJmAD(LVjmMPERATIJRE); 

OCBpClM 

wtaa  BUPPER_UNDERPLOW  «> 

!>SJ)BBUa3UP!^UNDERPLOW(-1HMFGRA'nJRE  OVEN  CONTROL*. 

•OVBN.COKIKOL'i 


-  PSDL  EjiCBfBka  haadfar. 
tBd  OVKN.OONTROL  JMUVER; 


Hgure  5.3.  TRIGGERED  BY  ALL  Implementation 


2.  Execution  Triggers 

The  execution  trigger  is  where  the  actual  program  that  implements  the 
functionality  of  that  operator,  which  is  provided  by  the  user,  will  be  called  if  the  conditions 
are  satisfied.  These  conditions  come  from  the  TRIGGERING  IF  pan  of  the  PSDL 
program.  Note  that  even  if  they  are  not  satisfied,  the  data  has  already  been  consumed,  and 
is  therefore  marked  as  old  data. 


Figure  5.4.  TRIGGERING  IF  Implementation 


3.  Output  Guards 

Finally,  the  output  guards  are  checked.  If  the  conditions  are  satisfied,  a 
rendezvous  with  the  output  stream  tasks  is  requested  by  calling  their  WRITE  entry. 


Hgure  5.5.  Output  Guards  Implementation 
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BesidM  these  packages  that  are  generated  by  the  Translator,  there  are  another  two 
packages  generated  by  the  Static  Scheduler  and  by  the  Dynamic  Scheduler.  When 
consolidated  by  one  of  the  CAPS  scripts,  they  will  form  the  so  called  prototype 
supervisory  program,  receiving  the  name  of  the  prototype  followed  by  a  ".a"  extension, 
which  stands  for  an  Ada  program. 


CAPS 

Support  Packages 


Dynamic  Schedule 
Task  Package 


Static  Schedule 
Task  Package 


procedure  prototype_name  is 
begin 

init_haidware_modcl:  Main  Program 

Stan  static  schedule; 

Stan  dynamic  schedule; 

tuA prototype_name  ; _ 


Exception  Declarations 
Generic  Instantiations 
Timer  Instantiations 
Data  Stream  Instantiations 
OpCTator  Drivers 


while  true  loop 

call  non-time-critical  operator  drivers; 
end  loop; 


while  truekxq) 

call  time-critical  operator  drivers ; 
end  loop; 


Figure  5.6.  CAPS  Supervisory  Program  Structure 


CAPS  is  composed  of  four  major  Ada  tasks  with  the  foUowing  priorities,  as 

deHned  in  the  package  PRIORITY_DEFINmONS: 

1)  Debugger  Task  -  it  handles  all  CAPS  debugging  tools  used  during  prototype 
execution,  and  has  the  highest  priority  within  CAPS,  which  is  4 

2)  Stream  Tasks  -  each  stream  is  implemented  as  one  Ada  task  with  priority  3 
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3)  Static  Scheduler  Task  -  it  is  responsible  for  calling  all  the  timing  critical 
operators,  accoiding  to  the  static  schedule.  The  TC  operators  will  be  called  in 
a  non-preemptive  way,  so  that  each  instance  of  an  operator  will  execute  to 
completion;  being  preen:5)ted  only  by  the  debugger  task,  or  during  operations 
with  the  stream  tasks.  It  has  a  priority  of  2.  Note  that,  although  the  stream 
tasks  have  higher  priority,  they  are  called  (synchronized)  by  this  task,  so  that 
there  will  be  no  problems  such  as  another  stream  from  another  operator  trying 
to  gain  control  of  the  CPU. 

4)  Dynamic  Scheduler  Task  -  it  is  assigned  the  lowest  priority  (priority  1)  within 
CAPS,  and  it  handles  all  the  non-time  critical  operators  of  the  prototype.  They 
will  run  in  a  pre-defined  order  established  by  the  dynamic  schedule,  whenever 
there  is  idle  time  in  the  static  schedule.  The  NTC  operators,  due  to  their  low 
priority,  can  be  preempted  by  any  other  task  and,  as  a  matter  of  fact,  they  are 
not  even  guaranteed  to  run  at  all.  This  problem  of  unbounded  blocking  will  be 
addressed  later  on. 

B.  THE  PROPOSED  DISTRIBUTED  ARCHITECTURE 

In  the  uniprocessor  case,  the  translator  had  no  information  about  the  output  of  the 
scheduler.  For  the  distributed  case,  however,  this  information  is  crucial,  since  it  will  have 
to  generate  different  Ada  units  for  each  of  the  processors  involved  in  the  prototyping. 

Once  the  scheduler  has  generated  the  different  partitions,  defining  which  operator 
belongs  to  which  partition,  the  translator  will  have  to  be  called,  so  that  it  can  generate  as 
many  supervisory  files  as  the  number  of  partitions.  It  is  suggested  that  the  prototype  name 
followed  by  the  partition  number  be  used  as  the  naming  convention  for  the  supervisory 
files,  c.g.  PATRIOT^ha,  PATRIOT_2.a,  and  so  on. 

The  following  information  should  be  passed  by  the  scheduler  to  the  translator,  so 

that  it  can  perform  its  job: 

1)  Number  of  partitions  and  a  list  with  the  operator  names  belonging  to  each 
partition 


no 


2)  Mapping  from  partitions  to  processors  according  to  the  requirements 

For  the  sake  of  simplicity,  it  is  assumed  that  there  is  a  homogenous  cluster  of 
processors,  so  that  a  configuration  of  partitions  is  not  needed.  The  process  of  mapping 
the  partitions  of  a  program  to  the  nodes  in  a  distributed  system  is  called  configuring  the 
partitions.  Note,  however,  that  even  after  having  abolished  condition  2,  there  is  still  a 
need  to  provide  the  translator  with  the  name  of  the  processors.  It  is  suggested  that  this 
information  come  from  the  CAPS  interface. 

Once  this  information  is  available  to  the  translator,  it  should  generate  a  supervisory 
file  for  each  partition,  exactly  as  it  did  for  the  uniprocessor  case,  except  for  the  following 
differences: 

1)  In  the  new  package  streams,  where  the  streams  are  instantiated,  if  a  specific 
stream  is  going  to  some  operator  external  to  that  partition,  and  only  in  that 
case,  it  should  be  hard-coded  as  an  instantiation  of  a  special  and  newly  created 
kind  of  stream,  i.e.,  the  network  stream.  Note  that  this  stream  has  only  one 
entry,  which  is  writejsxternal,  considering  that  all  reads  will  be  to  local 
streams.  Certainly,  the  package  PSDL_STREAMS  will  have  to  be  totally 
changed  to  conform  with  the  new  model  for  distributed  scheduling  without 
synchronization,  which  requires  a  buffer  size  of  three  for  the  network  streams. 
Another  modification  made  in  this  package  relates  to  the  sampled  streams, 
which  are  now  divided  into  two  groups,  non-triggering  (NT)  and  TRIGGERED 
BY  SOME  (TBS),  since  they  have  quite  different  semantic  behaviors.  Figure 
5.7  shows  the  specification  of  the  new  package  containing  the  stream  tasks. 
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with  PRIORITY.DEFINrnONS; 
use  PRIORTTY^DEFINmONS; 
package  PSDL_STREAMS  is 

BUFFER_OVERFLOW :  exceptkm; 
BUFFER_UNDERFLOW:  exceptioii; 

~  Implements  a  buffer  with  size  1,  for  sampled 

-  streams  with  no  triggering  condition  (Nl) 
generic 

type  ELEMENT_TYPE  is  private; 
package  NT_S  AMPLED.BUFFER  is 
tadc  BUFFER  is 

pragma  PRIORITY(BUFFER_PRIORITY); 
entry  READ(VALUE:  out  ELEMENT 
entry  WRnE(VALUE:  in  ELEMENT.TYPE); 
end  BUFFER; 

OMl  NT.SAMPLED.BUFFER; 

~  Implements  a  buff er  with  size  3,  for  sanq)led 

-  streams  that  have  triggering  "BY  SOME" 

~  condition  (TBS) 

generic 

type  ELEMENT_TYPE  is  private; 
p^age  TBS_S  AMPLED^BUFFER  is 
tadt  BUFFER  is 

pragma  PRIORITY(BUFFER.PRIORny); 
entry  CHECK(NEW^DATA:  out  BOOLEAN); 
entry  READ(VALUE:  out  ELEMENT.TYPE); 
entry  WRnE(VALUE:  in  ELEMENT.TYPE); 
end  BUFFER; 

function  NEW^DATA  return  BOCX-EAN; 
end  TBS.SAMPLED3UFFER; 

Implemenu  a  buffer  with  size  1,  for  state  streams 

-  that  have  no  triggering  condition  (NT) 
generic 

type  ELEMENT^TYPE  is  privatr, 
INITIAL^VALUE:  ELEMENTJTYPE; 
padcage  NT^STATE.BUFFERfa 
tadtBUFFERis 

pragma  PRIORmr(BUFFER.PRIORmr); 
entry  READ(VALUE:  out  ELEMENT.TYPE); 
entry  WRnE(VALUE:  mELEMENT3^E); 
end  BUFFER; 
end  NT^STATE.BUFFER; 


~  Implements  a  buffer  with  size  3,  for  states  streams 

-  that  have  triggering  "BY  SOME"  condition  (TBS) 
generic 

type  ELEMENTJTYPE  is  private; 

INTITALjVALUE:  ELEMENTJTYPE; 
paduige  TBSjSTATEjBUFFERls 
taskBUFFERis 

pragma  PRIORITY(BUFFERjPRIORrrY); 
entry  CHECK(NEWjDATA:  out  BOOLEAN); 
entiy  READ(VALUE:  omELEMENTjTYPE); 
entry  WRnE(VALUE:  inELEMENTjTYPE); 
end  BUFFER; 

function  NEWjDATA  return  BOOLEAti; 
end  TBSjSTATEjBUFFER; 

Implements  a  buffer  with  size  3,  for  dataflow 
^  streams,  that  is,  those  that  have  the  triggering 

-  "BY  ALL"  condition 
generic 

type  ELEMENTjTYPE  is  private; 
padtage  FIPO_BUFFER  Is 
ta^BUFFl^is 

pragma  PRIORrrY(BUFFER_PRIORrrY): 
entry  CHECK(NEWjDATA:  out  BOOLEAN); 
entry  WRITE(VALUE:  in  ELEMENT  TYPE); 
entry  READ(VALUE:  out  ELEMENTjTYPE); 
end  BUFFER; 

function  NEWjDATA  return  BO<XEAN; 
end  FIPOjBUFFER; 

-  In^ements  a  buffer  with  size  1,  for  networked 

-  stream,  no  matter  what  kind  of  streams  they  are 
with  AjSTRINGS;  iiae  AjSTRINGS; 

with  ADAjSTREAMS; 

wtthSYSTEMjRPC; 

generic 

type  ELEMENTjTYPE  is  private; 

PROC :  SYSTEMjRPCPARTmON.ID; 
STREAMjNAME :  in  AjSTRING; 
package  NETWORKjBUFFER  is 
ta*  BUFFER  b 

pragma  PRIORITYfEUFFER^PRIORITY); 
entry  WRnEjEXTERNAL( 

VALUE:  in  ELEMENT  TYPE; 
PROCrinSYSTEMjRPCPARTmON  ID; 
STREAMjNAME :  in  A.STRING); 
end  BUFFER; 
end  NETWORKjBUFFER; 
end  PSDLjSTREAMS; 


Hguir  5.7,  The  New  PSDL^Sireanis  Ada  Package  Specification 


2)  The  new  drivers  package  should  contain  only  the  driver  procedures  related  to 
the  operators  belonging  to  that  partition.  It  is  very  important  to  notice  that  the 
distributed  scheduling  model  assumes  that  a  stream  resides,  i.e.,  it  is 
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instantiated,  in  the  same  processor  or  partition  of  its  consumer  operator/ 
Therefore,  for  the  consumer  operator,  it  is  irrelevant  where  the  data  came  from, 
and,  furthermore,  no  changes  will  be  needed  for  the  individual  driver 
procedures  within  this  package,  since  all  the  reads  will  be  to  local  streams.  The 
only  change  would  occur  if  it  was  necessary  to  perform  a  write  to  an  external 
operator.  In  this  case,  the  write  operation  should  be  hard-coded  by  the 
translator  as  a  call  to  writejBXternal,  an  entry  of  the  special  network  stream 
task.  In  Figure  5.8,  which  presents  the  network  stream  task  body,  it  is  apparent 
that,  after  this  rendezvous  is  accepted,  there  should  be  a  call  to  some  inter¬ 
processor  communication  routine,  e.g.,  DO_APC,  that  would  deliver  the 
message.  It  is  also  at  this  point  where  most  of  the  problems  are  going  to 
appear,  as  shall  be  seen. 


with  A.STRINGS;  use  A.STRINGS 
with  ADA_STREAMS; 
with  SYSTEM_RPC; 
package  body  NETWORK_BUFFER  is 
task  body  BUFFER  is 

PARAMETERS :  SYSTEM_RPC.PARAMS_STREAM_TYPE(3); 

—  This  type  allows  multiple  stream  elements  within  the 
—  same  stream,  depending  on  its  declaration 
begin 
loop 

accept  WRITE_EXTERNAL(VALUE;  in  ELEMENT.TYPE; 

PROCESSOR :  in  SYSTCM_RPCPARTmON_ID; 

STREAM.NAME :  in  A.STRING)  do 

SYSTEM_RPCDO_APC(PROCESSORJ»ARAMETERS); 

—  parameters  will  include  the  remote  procedure  name, 

-  the  psdl_stream_name  and  value 
end  WRTTE.EXTERNAL; 

end  loop; 
end  BUFFER; 
end  NETWORK_BUFFER; 


Figure  5.8.  Body  of  the  Network  Stream  Task 


*  This  assumption  will  require  that  all  excq)tions  bom  external  streams  should  be  treated  and 
consequently  hard-coded  in  the  consumer’s  side. 


The  changes  made  so  far  are  very  minor,  since  most  of  the  burden  is  being  put  on 
the  write  operation  to  external  streams.  In  fact,  the  most  difficult  part  of  this 
implementation  is  finding  a  way  to  receive  the  incoming  messages  from  the  different 
processors  and  operators.  Some  kind  of  commumcations  server,  that  will  have  the  duty  of 
receiving  and  routing  all  the  incoming  messages  to  its  final  destination,  will  be  needed. 
Due  to  the  semantics  of  PSDL,  in  order  to  reliably  implement  this  communication,  it  will 
be  necessary  to  send  some  kind  of  header  containing  the  consumer  operator,  the  name  of 
the  stream  and  the  name  of  the  destination  processor  along  with  the  data. 

These  requirements  for  the  header  come  from  situations  such  as  when  the  same 
operator  is  trying  to  write  to  the  same  stream  into  diffoent  operators  in  different 
partitions.  This  case  is  illustrated  in  Figure  5.9.  In  the  next  section  the  different  options 
available  for  implementing  this  communication  sub-system  are  described. 


Hgure  5.9.  Justification  for  the  Header  Information 

C.  IMPLEMENTATION  ISSUES  OF  THE  COMMUNICATION  SUBSYSTEM 

One  of  the  most  irrgwrtant  design  issues  is  the  choice  of  the  communication 
subsystem.  It  is  recommended  to  use  the  remote  procedure  call  (RPC)  paradigm  as 
opposed  to  the  traditional  message  passing  mechanism.  The  reasons  for  this  choice  is  that 
RPC  is  widely  implemented  for  interprocess  communication  between  computers  across  a 
network,  being  supponed  by  most  of  the  emerging  distributed  operating  systems.  Several 
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standards  have  been  initiated  by  organizations,  such  as  ISO  and  OSF.  This  method  also 
provides  an  asynchronous  form,  relaxing  the  original  synchronous  semantics  of  RPC. 
Finally,  the  Annex  E  (Distributed  Systems)  of  the  Ada95  Reference  Manual  makes  it  the 
choice,  though  not  mandatory,  for  future  implementations  of  this  Annex.[Ada95] 

1.  The  RPC  Model 

The  remote  procedure  call  model  is  similar  to  the  local  procedure  call  model.  In 
the  local  case,  the  caller  places  arguments  to  a  procedure  in  some  well-^)ecified  location. 
It  then  transfers  control  to  the  procedure,  and  eventually  gains  back  control.  At  that 
point,  the  results  of  the  procedure  are  extracted  from  the  well-specified  location  and  the 
caller  continues  execution.[Sun90] 

The  remote  procedure  call  is  similar.  That  is,  the  caller  process  sends  a  call 
message  to  the  server  process  and  waits  (blocks)  for  a  reply  message.  The  call  message 
contains  the  procedure's  parameters,  among  other  things.  The  rq)ly  message  contains  the 
procedure’s  results,  among  other  things.  Once  the  rqily  message  is  received,  the  results  of 
the  procedure  are  extracted,  and  the  caller's  execution  is  resumed.[Sun90] 

Note  that  in  this  noodel,  only  one  of  the  two  processes  is  active  at  any  given  time. 
The  RPC  protocol,  however,  makes  no  restriction  if  the  implementation  allows  the  calling 
routine  to  do  some  useful  work  while  waiting  for  the  reply  (asynchronous  mode). 

2.  The  First  Approach 

The  first  idea  was  to  implement  the  RPC  paradigm  ly  using  the  standard  RPC 
libraries.  However,  in  order  to  do  that  within  CAPS,  it  would  be  necessary  to  call  from 
inside  an  Ada  task,  more  specifically  from  inside  the  network  tasks,  a  C  routine  that  would 
implement  the  RPC  calls  (see  Rgure  5.8).  The  reason  for  a  C  routine  is  that  there  is  no 
library  support  or  existing  bindings  for  implementing  RPC  fiom  inside  Ada83.  It  would 
not  be  difficult  to  write  an  Ada  wrapper  to  the  C  routine.  However,  the  biggest  problem 
to  be  dealt  with  is  how  to  pass  the  Ada  parameters  to  the  C  routine,  which  could  be  very 
complicated  abstract  data  types  from  the  PSDL  prototype.  Assuming  that  this  problem 
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could  somehow  be  solved,  there  is  an  additional  problem:  How  could  this  C  routine  pass 
the  complex  ADTs  through  the  streams?  In  the  Unix/C  world,  there  currently  exists  a 
great  deal  of  support  for  these  kinds  of  operations. 

For  exanqjle,  the  rcpgen  utility  is  basically  a  compiler  that  accepts  a  remote 
program  int^face  definition  written  in  the  RP*C  language,  which  is  very  similar  to  C,  and 
outputs  a  C  program,  containing  all  the  client  routines,  the  server  routine,  and  most 
importantly,  all  the  XDR  filter  routines.  An  XDR  routine  converts  procedure  arguments 
and  results  in  the  network  format  (sequential  streams)  and  vice-versa. 

The  External  Data  Representation  (XDR)  standard  comprises  a  set  of  library 
routines  that  allow  a  C  programmer  to  describe  arbitrary  data  structures  in  a  machine- 
independent  fashion.  XDR  is  the  backbone  of  Sun's  RPC  package,  in  the  sense  that  data 
for  remote  procedure  calls  is  transmitted  using  this  standard.  It  was  designed  to  work 
across  different  languages,  operating  systems,  and  machine  architectures. 

It  is  inqxrrtant  to  note,  however,  that  most  of  the  time  required  to  prepare  a  data 
structure  for  transfer  is  not  spent  in  conversion  but  in  traversing  the  elements  of  the  data 
structure.  To  transmit  a  tree,  for  example,  each  leaf  must  be  visited  and  each  element  in  a 
leaf  record  must  be  copied  to  a  buffer  and  aligned  there.  Storage  for  the  leaf  may  have  to 
be  deallocated  after  the  data  is  sent  Similarly,  to  receive  a  tree,  storage  must  be  allocated 
for  each  leaf,  data  must  be  moved  fiom  the  buffer  to  the  leaf  and  properly  aligned,  and 
pointers  must  be  constructed  to  link  the  leaves  together.  [Sun90] 

In  this  case  what  is  needed  is  a  remote  procedure  called  receive,  running  in  all  the 
machines,  ready  to  intercept  any  incoming  messages,  and  another  routine,  namely  send, 
that  will  also  run  in  all  machines  and  will  remotely  call  the  receive  routine.  In  Figure  5.10 
both  routines  which  were  successfully  tested  in  the  “C’  environment  are  presented.  Note 
that  the  send  routine  is  not  sending  anything,  but  merely  passing  parameters  to  the  remote 
procedure  receive. 
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RPC_REC.C 

i  main(argc,  argv) 

/*  receivenc  -  remote  procedures;  called  by  server 

int  arge; 

stub.  */ 

char  *argvD; 

{ 

CLIENT  *cl;  /*  RPC  handle  */ 

#include  <stdio.h> 

/*  Standard  RPC  include  file  V 

char  *receiver_name: 

#include  <ipc/rpc.li> 

char  **status; 

this  file  is  generated  by  rpcgen  •/ 

#include  "RPC_reccive.h" 

char  *message; 

if  (aigc  !=  3)  { 

!*  Receive  a  siring  of  chars  and  reply  with  a  status 

fptintf(stdeiT,  "usage:  %s  hosmame 

*1 

messageXn",  argv[0]); 

exit(l); 

char  ** 

) 

receive_l(niessage) 

rcceiver_name  =  argv[l]; 

char  message; 

message  =  argv[2]; 

{ 

/*  Create  the  client  "handle"  */ 

static  charstanis[20]  =  "OK"; 

if  (  (cl  =  clnt_cieate(receiver_name. 

static  charptrflOO]; 

DISTR_SCHEDULE,  CAPS95.  "udp")) 

static  char*ptrl; 

=  NULL)  { 

/*  Can't  establish  connection  with  receiver  */ 

printfCReceived  message  =  %sVn",  'Message); 

clnt_pcreaieertor(receiver_name); 

fnush(stdout); 

exit(2): 

ptrl  =  &status[0]; 
strcpy  (ptr,*message); 

i  ) 

ptrl  =  &ptr[0];  •/ 

/*  call  the  remote  procedure  "receive_l "  */ 

retum(&ptrl); 

printf("Message  to  be  transmited  =  %s\n". 

} 

message); 

fflush(stdout); 

if  ((status  =  ieceivc_l(&message,  cl))  = 

RPC_SEND.C 

NULL){ 

r  RPC_send.c  -  client  program  for  remote  receive 

clnt_penor(cl,  receiver_name); 

service.*/ 

cxit(3); 

) 

printfCStatus  fixmi  remote  receiver  %s  is 

#include  <string.h> 

#include  <stdio.h> 

%^sn",  receiver_name,*status); 

/•  standard  RPC  include  file  */ 

clnt_destroy(cl);  /•  done  with  the  handle  •/ 

#include  <rpcApc.h> 

cxit(0); 

/*  this  file  is  generated  by  ipegen  •/ 

#include  "RPC_receivcJi"  | 

) 

Figure  5.10.  The  RPC  Programs  for  the  New  Scheduler 


Finally,  if  both  problems  have  been  solved,  i.e.,  the  parameter  passing  between  C 
and  Ada  in  the  sender  side  and  the  Ada  bindings  for  the  XDR  routines,  there  is  grill  an 
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additional  problem  in  the  receiver  side  due  to  the  way  RPC  is  now  implemented  in  C.  The 
receiving,  or  the  server,  routine,  is  implemented  as  a  forever  loop  by  calling  the  Unix 
system  call  svc_run().  To  overcome  this  problem  one  would  need  to  be  able  to  call  an 
Ada  procedure  from  inside  a  C  routine,  and  again  the  same  problem  of  passing  parameters 
would  be  present 

Another  approach,  such  as  using  files  to  exchange  data  between  C  and  Ada,  could 
be  used,  but  then  other  problems,  such  as  file  locking,  and  internal  synchronization 
between  C  and  Ada  tasking  (so  that  no  data  could  be  overwritten  before  being  consumed) 
would  come  into  play. 

Because  of  all  these  problems,  it  seems  that  a  better  solution  is  needed,  and  just 
such  a  solution  is  present  in  the  Ada95  in:q)lementation,  which  will  be  described  next 

3.  The  Ada95  Approach 

Annex  E  defines  facilities  for  supporting  the  iirqjlementation  of  distributed  systems 
using  multiple  partitions  working  cooperatively  as  part  of  a  single  Ada  program.  These 
facilities  include  pragmas  for  categorizing  library  units  according  to  the  role  they  play  in 
the  distributed  system,  such  as  Shared_Passive,  Remote_Typcs  and 
Remotc_CaD_Interface,  and  other  mechanisms  for  supporting  communication  and  access 
to  shared  data.  [Ada95] 

The  Partition  Communication  Subsystem  (PCS),  as  defined  in  Annex  E,  provides 
facilities  for  supporting  communication  between  the  active  partitions  of  a  distributed 
program  by  using  the  remote  procedure  call  interface  (RPC).  The  aimex  also  proposes  a 
specification  for  the  RPC  interface  between  active  partitions  within  the  PCS,  which  will  be 
contained  in  the  package  System.RPC.  Figure  5.11  introduces  the  proposed  specification 
for  the  package  System.RPC 
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with  Ada.Streams; 
package  System.RPC  is 

type  Paitition_ID  is  range  0 ..  implementation-defined 
Con[imunication_Error :  exception; 

type  Params_Streain_Type  (hutial_size :  Ada.Streams.Stream_Element_Count)  is  new 

Ada.StreainsJloot_Streain_Type  with  private; 

procedure  Read(Streain :  in  out  Params_Streain_Type; 

Item ;  out  Ada.Streams.Stream_Element_Anay; 

Last :  out  Ada.Slie3ms.Stieam_Element_Offset); 

procedure  Write(Stream :  in  out  Params_Stream_Type; 

Item :  in  Ada.Stieams^tream_Element_Anay); 

—  Synchronous  call 

procedure  Do_RPC(Paitition :  in  Partition_ID; 

Params :  access  Params_Stieam_Type 
Result :  access  Paiams_Sueam_Type); 

~  Asynchronous  call 

procedure  Do_APC(Partition :  in  PartitionJD; 

Params :  access  Params_Stream_Type); 

—  The  handler  for  incoming  RPCs 

type  RPC_Receiver  is  acess  procedure(Params :  access  Paiams_Stream_Type 

Result :  access  Params_Stream_Type); 
procedure  Establish  RPC_Receivcr(Receiver :  in  RPC_Receiver); 

private 

—  not  specified  by  the  language 
end  System.RPC; 

Hgure  5.11.  Package  System.RPC  (Specification) 

As  noted  in  Hguie  5.11,  during  the  execudon  of  a  remote  subprogram  call,  most 
of  the  parameters  (and  later  results,  if  any)  are  passed  using  a  stream  oriented 
representadon  which  is  suitable  for  transmission  between  parddons.  The  annex  calls  this 
acdon  marshalling.  Unmarshalling  is  the  reverse  acdon  of  reconstrucdng  the  parameters 
or  results  from  the  stream-oriented  representadon.  Note  that  there  is  not  any  defined 
standard  for  transformadon,  but  nevertheless  the  XDR  standard  seems  to  be  the  choice  for 
most  of  the  Ada  compiler  vendors. 
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The  type  Partition_ID  is  used  to  identify  a  partition,  and  Params_Stream_Type  is 
used  for  identifying  the  particular  remote  subprogram  that  is  being  called,  as  well  as 
marshalling  and  unmarshalling  the  parameters  or  result  of  a  remote  subprogram  call,  as 
part  of  sending  them  between  partitions.  The  Read  and  Write  procedures  override  the 
corresponding  abstract  operations  for  the  type  Params_Stream_Type. 

Both  synchronous  and  asynchronous  communication  are  supported,  and  are 
implemented  by  the  procedures  Do_RPC  and  Do_APC,  respectively.  Both  procedures 
send  a  message  to  the  active  partition  identified  by  the  Partition  parameter.  The  first  one 
blocks  the  calling  task  until  a  reply  message  comes  fiom  the  called  partition,  or  some  error 
is  detected  Ity  the  PCS,  in  which  case  Communication_Error  is  raised  at  the  point  of  the 
call  to  Do_RPC.  Do_APC  operates  in  the  same  way  as  Do_RPC,  except  that  it  is  allowed 
to  return  immediately  after  sending  the  message. 

finally,  the  procedure  Establish_RPC_Receiver  is  called  only  once,  immediately 
after  elaborating  the  library  units  of  an  active  partition,  but  prior  to  invoking  the  main 
subprogram,  if  any.  The  Receiver  parameter  designates  an  implementation-provided 
procedure  called  the  RPC_Receiver  which  will  handle  all  RPCs  received  by  the  partition. 
Establish_RPC_Receiver  saves  a  reference  to  the  RPC-receiver.  When  a  message  is 
received  at  the  called  partition,  the  RPC-receiver  is  called  with  the  Params  stream 
containing  the  message.  When  the  RPC-receiver  returns,  the  contents  of  the  stream 
designated  by  Result  is  placed  in  a  message  and  sent  back  to  the  calling  partition. 

The  implementation  of  the  RPC-receiver  shall  be  reentrant,  thereby  allowing 
concurrent  calls  on  it  from  the  PCS  to  service  concurrent  remote  subprogram  calls  into  the 
partition. 

a.  The  Package  Streams 

A  Stream  is  a  sequence  of  elements  comprising  values  from  possibly 
different  types,  and  allowing  sequential  access  to  these  values.  A  stream  type  is  a  type  in 
the  class  whose  root  type  is  Streams.  Root_Strcam_Type.  [Ada95] 
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The  types  in  this  class  represent  different  kinds  of  streams.  The  pre-defined 

stream-oriented  attributes  like  TRead  and  TWiite  make  dispatching  calls  on  the  Read  and 

Write  procedures  of  the  Root_Stream_Type. 

package  Ada.Streains  is 
pragma  PurefStieams); 

type  Root_Stream_Type  is  abstract  tagged  limited  private; 
type  Stream.Element  is  mod  implementation-defined; 
type  Streaffl_Element_0£Fset  is  range  in^lementation-defined; 
subtype  Stream_Eiement_Count  is 

Slieain_Element_Offset  range  0 ..  Stream.ElanenLOffsefLast; 
type  Stream_Element_Anay  is 

array(Stream_Eiement_Offset  range  o)  of  Stream.Element; 

procedure  ReadfStream :  in  out  Root_Stream_Type: 

Item ;  out  Stieam_Elemait_Anay; 

Last :  out  Slream_Element_01fset)  is  abstract; 

procedure  WritefStream :  in  out  Root_Stream_Type; 

Item :  in  Stream.Element.Airay)  is  abstract; 

private 

-  not  specified  by  the  language 
end  Ada-Streams; 


Hgure  5.12.  Package  Ada.Streams  (Specification) 

Read  operations  transfer  Itemljength  stream  elements  from  the  specified 
stream  to  fill  the  array  Item.  The  index  of  the  last  stream  element  transferred  is  returned  in 
Last  Last  is  less  than  ItemTast  only  if  the  end  of  the  stream  is  reached. 

The  Write  operation  appends  Item  to  the  specified  stream.  There  are  also 
the  Read,  Write,  Output  and  Input  attributes  that  convert  values  to  a  stream  of  elements 
and  reconsmict  values  fiom  a  stream. 

For  every  subtype  S  of  a  type  T,  some  attributes  are  defined,  which  denote 
either  a  procedure  or  a  function  call.  Figure  5.13  presents  such  attributes. 


121 


-  writes  the  value  of  Item  to  Stream 

procedure  SWritefStream :  access  Ada.StFeams.  Root_Stream_Type'Class; 
Item :  T); 

~  reads  the  value  of  Item  from  Stream 

procedure  SHeadfStream :  access  Ada.Streams.  Root_Stream_Type'Class; 
Item :  out  T); 

-  writes  the  value  of  Item  to  Stream,  including  any  bounds  or  discriminants 
procedure  S*Ouq>ut(Stream ;  access  Ada.Streams.  Root_Stream_Type'Gass; 

Item :  I); 

~  reads  and  returns  the  value  of  Item  from  Stream,  using  any  bounds  or 

-  discriminants  written  by  a  cotrespmiding  S'Oulput 

function  SlnpuKStream :  access  Ada.Streams.  Root_Stream_Type'Qass; 
return  1); 


Figure  5.13.  Stream  Attributes 
b.  Conclusions 

All  of  the  problems  that  have  been  discussed  in  this  section  have  been 
addressed  in  the  Ada95  implementadon.  Therefore,  in  order  to  implement  the  distributed 
scheduling  model,  it  is  only  necessary  to  follow  the  directions  introduced  in  Section  B.  It 
is  now  apparent  that  the  exan^le  given  in  Figure  5.8  had  already  considered  the  packages 
(System_RPC  and  Ada_Strcams)  and  procedures  (DO_APC)  to  be  introduced  with 
Ada95.  The  only  part  that  is  not  yet  clear,  because  it  is  dependent  upon  implementation, 
is  the  marshalling  and  uiunarshaling  operations,  which  will  affect  the  manner  in  which  the 
Ada  stream  is  constructed  horn  the  parameters  passed  during  the  rendezvous  with  the 
write_extemal  entry  of  the  network  stream  task. 

Hgure  5.14  presents  a  pictorial  view  of  the  proposed  architecture  for  the 
new  Distributed  CAPS  Scheduler. 
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Figure  5.14.  Architecture  for  the  Distributed  CAPS  Scheduler 
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D.  CPU  SPEED  RATIO  ISSUES  IN  A  PROTOTYPING 
ENVIRONMENT 

In  a  software  prototyping  environment,  where  the  host  machines  usually  used  for 
prototyping  are  not  similar  to  the  intended  target  machines  (which  may  not  even  be  known 
a  priori),  special  attention  must  be  taken  so  that  erroneous  conclusions  due  to  timing 
problems  during  the  prototyping  are  avoided. 

There  are  two  kinds  of  timing  errors  that  can  be  foreseen  in  a  real-time  system. 
Both  of  them  may  cause  undesirable  system  behavior,  such  as  deadlocks,  buffer  overflows, 
or  data  inconsistency.  The  first  kind  of  error  has  a  relative  nature,  since  it  is  caused  by 
computational  events  that  occur  in  an  improper  sequence.  They  arc  solely  dependent  on 
the  relative  order  in  which  the  computations  occur,  and  can  be  avoided  by  proper 
scheduling  of  the  system  (Mok83]. 

The  second  kind  of  error  is  more  subtle,  in  the  sense  that  it  is  caused  by  violation 
of  some  specified  timing  constraints,  such  as  missing  deadlines.  In  CAPS,  since  a  static 
schedule  is  used  to  execute  the  prototype,  this  problem  can  only  happen  if  the  MET  was 
inaccurately  speafied,  or  if  the  MET  was  specified  for  running  in  a  faster  machine.  What, 
then,  is  the  real  meaning  of  the  MET?  Is  it  an  absolute  value,  or  is  it  dependent  upon  the 
machine  in  which  the  module  is  running?  Qearly,  this  is  only  the  tip  of  an  iceberg,  and  the 
answer  is  no,  it  cannot  be  absolute,  since  the  attribute  execution  time.is  a  function  of  the 
machine  throughput.  A  module  that  has  an  MET  of  150  ms  for  some  specific  machine 
may  take  longer  than  that  to  execute  if  running  in  a  slower  machine. 

The  problem  is  even  bigger  if  the  CAPS  Software  Base,  which  is  supposed  to  be  a 
collection  of  reusable  components  provided  by  different  vendors,  is  taken  into  account 
Each  component  should  have  a  PSDL  specification,  with  all  the  timing  constraints,  such  as 
MET,  MRT,  MCP,  etc.  All  of  this  information  will  be  used  during  the  execution  phase  of 
the  prototype,  in  trying  to  match  needs  with  the  available  components.  The  same  problem 
arises  regarding  their  timing  reference,  since  each  vendor  may  well  have  their  own. 
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This  discussion  demonstrates  the  imperative  need  for  assuming  a  common  timing 
reference  within  CAPS.  It  can  be  anything,  as  long  as  it  is  consistent  and  used  throughout 
the  prototype.  Care  must  be  taken  when  choosing  this  reference,  however,  since  it  may 
lead  to  significant  differences  when  dealing  with  reusable  components  from  different 
sources. 

1.  Choosing  a  Reference 

Standard  measures  of  performance  provide  a  baas  for  comparison,  and  time  is  the 
best  way  to  measure  computer  performance.  The  computer  that  perfoims  the  same 
amount  of  work  in  the  least  time  is  the  fastest  A  number  of  popular  measures  have  been 
adopted  in  the  quest  for  a  standard  measure  of  computer  performance,  but  most  of  them 
were  forced  into  a  service  for  which  they  were  never  intended.  [HP90] 

The  MIPS,  million  instructions  per  second,  is  easily  understood  by  a  customer,  in 
that  faster  machines  means  bigger  MIPS.  However,  the  MIPS  measure  presents  the 
following  problems: 

1)  MIPS  is  dependent  on  the  instruction  set  making  it  difficult  to  con:q)are 
machines  with  different  instruction  sets 

2)  MIPS  varies  between  programs  on  the  same  conq)uter 

3)  MIPS  can  vary  inversely  to  performance 

A  classic  example  to  the  third  of  these  points  is  the  MIPS  rating  of  a  machine  with 
optional  floating-point  hardware.  If  it  uses  the  hardware  floating-point  unit  it  will  take 
less  time  to  execute,  but  it  will  also  execute  fewer  and  more  conqrlex  instructions. 
Software  floating-point  executes  more  but  sinpler  instructions,  resulting  in  a  higher  MIPS 
rating  [HP90]. 

Another  popular  alternative  is  million  floating-point  operations  per  second, 
abbreviated  as  MFLOPS.  However,  MFLOPS  is,  clearly,  highly  dep>endent  on  the 
machine  and  on  the  program. 
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Other  options  are  synthetic  benchmarks,  such  as  Whetstone  and  Dhiystone,  but  the 
best  choice  appears  to  be  to  use  real  programs,  such  as  compilers,  text  editors,  CAD  tools, 
etc.,  which  have  inputs,  outputs,  and  other  user-defined  options.  [HP90] 

While  having  a  standard  of  performance  for  computers  is  still  beyond  the  horizon, 
for  prototyping  purposes  within  CAPS,  where  many  of  the  figures  are  still  subject  to 
change  during  the  prototype  refinement  process,  any  of  these  metrics  provides  a  good 
starting  place.  Again,  for  the  sake  of  simplicity,  the  MIPS  rating  will  be  the  reference 
model  for  performance  in  this  woric 


2.  CAPS  Timing  Model 

It  will  be  useful  to  define  some  of  the  terms  used  in  construction  of  the  model: 

CAPS  Reference  —Specifies  the  MIPS  rating  of  a  hy]x>thetical  machine,  to  which 
all  of  the  CAPS  timing  information  should  be  normalized. 

HOST  Reference  -  Specifies  the  MIPS  rating  of  the  host  machine  where  CAPS  is 
installed.  This  value  wiU  be  automatically  generated  by  CAPS  at  the  start  of  the  session, 
and  it  is  the  result  of  an  Unix  system  call. 

TARGET  Reference  -  Specifies  the  MIPS  rating  of  the  target  machine.  In  the 
absence  of  this  value,  it  is  assumed  that  the  host  machine  for  CAPS  is  identical  to  the 
target  machine.  This  value  should  be  provided  by  the  user  at  the  beginning  of  the  design 
of  the  prototype,  and  will  affect  the  retrieval  of  reusable  components  fiom  the  Software 
Base. 


CPU  Speed  Ratio  ~  Specifies  the  MIPS  ratio  between  the  target  and  the  host 
machine.  It  can  be  changed  by  the  user  to  make  temporary  simulations  and  to  overcome 
possible  timing  errors.  It  is  important  to  note  that  this  value  wiD  have  a  very  important 
role  in  debugging  possible  timing  errors  during  prototype  execution.  Its  default  value  is 
given  by  the  formula: 


CPUSpeedRatio  = 


Target  Reference 
Host  Reference 
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Table  5.1  specifies  the  default  values  which  will  used  throughout  this  discussion, 
unless  otherwise  stated. 


l&fevaice 

Target 

Keference 

CT  Speed 

R^o 

M 

10  MIPS 

20  MIPS 

15  MIPS 

1.33 

Table  5.1.  Default  Values  for  the  Timing  Model 

a.  Buiiding  the  Prototype 

All  timing  infcnmation,  such  as  MET,  PER,  FW,  MRT,  MCP,  LAT  and 
MOP,  specified  by  the  user  during  the  design  phase  of  the  prototype,  which  in  most  cases 
come  from  the  Requirements  Document,  are  assumed  to  be  referenced  or  normalized  to 
the  Target  Reference.  Therefore,  when,  for  exanq)le,  defining  an  operator  with  MET  = 
100ms,  it  should  be  understood  that  100ms  would  be  the  maximum  execution  time 
allowed  for  that  operator  if  running  in  the  target  machine.  It  will  default  to  the  host 
machine  if  the  Target  Reference  is  not  given. 

Note  that  the  MET  of  this  operator  is  equivalent  to  200ms  with  respect  to 
the  CAPS  Reference;  it  is  this  value  of  200ms  that  will  be  used  in  the  query  to  the 
Software  Base  during  the  search  for  a  matching  reusable  component  Observe  also  that 
this  value  will  not  affect  Translation  nor  Scheduling,  since  all  timing  information  is 
consistently  and  linearly  normalized  to  the  CAPS  Reference. 

b.  Installing  Components  in  the  Software  Base 

When  getting  reusable  components  from  a  specific  vendor  or  supplier,  the 
timing  reference  used  to  classify  their  components  should  be  specified  along  with  the 
component  For  exaii^le,  v^en  a  component  arrives,  it  should  be  labeled  as  follows: 
component  X  has  a  certified  MET  of  lOOms  under  a  5  MIPS  machine. 

This  information  will  allow  the  insertion  of  the  component  into  the 
Software  Base  as  a  component  with  MET  equal  to  50ms,  which  is  the  correct  value 
normalized  with  respect  to  the  CAPS  Reference.  Note  that  this  value  will  be  used  during 
its  retrieval  from  the  software  base  by  the  prototypes. 
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3. 


Relations  between  CPU  Speed  Ratio  and  Timing  Errors 


Assuming  that  all  timing  infcHmation  from  the  reusable  components  is  correct  with 
respect  to  the  supplier’s  reference,  then  there  should  be  no  timing  errors,  if  the  component 
matches  the  prototype  specification.  For  example: 

Suppose  that  a  component  with  an  MET  of  120ms  is  needed.  Then  the  correct 
query  to  be  performed  on  the  Software  Base  should  be  for  a  component  with  an  MET  of 
240ms,  i.e., 

METcaps  =  METtarget  x  ^ 

CAPoref 

Therefore,  using  this  component  in  the  prototype,  according  to  the  generated  static 
schedule,  should  not  cause  any  timing  errors.  However,  if  it  does  cause  a  timing  error, 
then  it  is  possible  to  conclude  that  the  component  timing  information  was  incorrect.  To 
solve  this  problem,  the  following  steps  can  be  taken: 

a)  Increase  the  CPU  Speed  Ratio  until  the  error  disappears.  This  means  that  a 
reasonable  MET  for  that  com|x>nent  with  respect  to  the  Target  reference,  although  it  may 
not  be  the  tightest  one,  is  equal  to: 


New  MET- 

T«gei 


New  CPU  Speed  Ratio 

Old  CPU  Speed  Ratio  Anginal  MET^^^ 


Note  that  another  side  effect  in  performing  step  a)  is  that  the  entire  schedule  is 
stretched,  and,  consequently,  the  slack  time  available  for  the  dynamic  scheduler  is 
increased,  since  some  of  the  timing  critical  operators  don’t  need  more  time  to  execute. 

b)  Update  the  Software  Base  with  the  correct  timing  information  for  that 
component 


c)  Reset  the  CPU  Speed  Ratio  to  its  original  value  and  take  cither  step  d),  c)  or  f) 
to  solve  the  problem. 

d)  If  requirements  permit,change  the  PSDL  specification  to  allow  the  bigger  MET 
found  in  step  a).  This  in  turn  will  require  a  whole  new  CAPS  session,  starting  fix)m  a  new 
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translation  until  the  final  compilation.  Note  that  increasing  the  MET  affects  the  load 

factor  and  may  cause  an  unfeasible  schedule. 

e)  Search  the  Software  Base  for  another  reusable  component  that  matches  the 

original  MET.  This  new  one  may  well  have  the  correct  information. 

f)  Create  another  new  component  or  optimize  the  existing  component  Validate  its 

timing  constraints  and  update  the  Software  Base  if  succesfuU. 

g)  If  it  is  realized  that  a  faster  target  processor  is  needed  in  order  to  cope  with  the 
requirements,  then  the  Target  Reference  should  be  changed  so  that  those  timing  errors 
disappear.  Note  that  this  change  will  only  affect  the  CPU  Speed  Ratio,  and  as  explained 
earlier,  and  will  not  change  the  schedule.  Theoretically,  the  necessary  change  for  the 
Target  Reference  can  be  derived  very  easily  from  the  following  formula: 

New  Target  Reference  =  New  CPU  Speed  Ratio  x  Original  Host  Reference 
The  other  source  of  timing  errors  is  found  when  dealing  with  user-created 
components.  In  other  words,  the  component  just  created  takes  more  time  than  that 
specified.  For  example,  assume  the  previous  situation,  where  a  component  with  MET  of 
120ms  is  required.  Since  the  host  machine  is  slower  than  the  target  machine,  the 
scheduling  time  will  be  linearly  stretched  by  a  factor  of  1.33,  that  is,  1.33  x  120ms,  or 
159.6ms,  will  be  allowed  for  the  execution  interval  of  this  component.  If  timing  errors 
occur,  the  following  steps  can  be  taken  to  eliminate  them: 

a)  Increase  the  CPU  Speed  Ratio  until  the  error  disappears.  This  means  that  a 
reasonable  MET  for  that  component  with  re^ea  to  the  Target  reference,  althou^  it  may 


not  be  the  tightest  one,  is  equal  to: 

New  CPU  Speed  Ratio  ^ 

Npw  MET  = - - - Ongmal  MET 

MewMti.^^^  Old  CPU  Speed  Rano 

b)  Reset  the  CPU  Speed  Ratio  for  its  original  value  and  take  either  step  c)  or  d)  to 
solve  the  problem. 

c)  If  requirements  permits,  change  the  PSDL  specification  to  allow  the  bigger 
MET  found  in  step  a).  This  in  turn  will  require  a  whole  new  CAPS  session,  starting  from 
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a  new  translation  until  the  final  compilation.  Again,  this  change  may  cause  an  unfeasible 
schedule. 

d)  Rewrite  the  component  trying  to  speed  it  up; 

e)  If  it  is  realized  that  a  faster  target  processor  is  needed  in  order  to  cope  with  the 
requirements,  then  the  Target  Reference  should  be  changed  so  that  those  timing  errors 

disappear.  The  required  change  for  the  Target  Reference  can  be  derived  from  the 
following  formula: 

New  Target  Reference  =  New  CPU  Speed  Ratio  x  Original  Host  Reference 

f)  After  getting  rid  of  the  timing  errors,  if  it  is  decided  to  add  the  user-created 
component  to  the  software  base,  the  component  should  be  associated  with  an  METcaps 

equal  to  METt^c  . 

TargetREp 

4.  How  the  CPU  Speed  Ratio  affects  Scheduling 

The  Static  Schedule  is  basically  a  sequence  of  pairs  of  absolute  values  containing 

e  start  time  and  stop  time  for  each  instance  of  the  time-critical  operators  within  one 
harmonic  block. 

At  the  beginning,  the  static  scheduler  task  calls  the  function  TARGET_TO_HOST, 
which  belongs  to  the  package  PSDL.ITMERS,  and  multiphes  all  those  absolute  time' 
values  the  CPU  Speed  Ratio.  The  net  effect  is  that  the  scheduler  wiU  stretch  or  shrink 
all  of  the  timing  information  related  to  the  prototype  in  a  linear  fashion. 
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Figure  5.15.  Effect  of  the  CPU  Speed  Ratio  on  the  Schedule 

5.  Handling  Unwanted  Interactions  during  Prototype  Scheduling 

A  software  prototyping  environment  needs  to  simulate  external  entities  so  that  the 
entire  system  being  prototyped  can  be  exercised.  These  external  entities  will  in  most  cases 
either  generate  inputs  or  consume  outputs  from  the  core  of  the  system  being  prototyped. 
This  requires  that  the  timing  constraints  are  taken  into  consideration  during  the  generation 
of  the  schedule.  However,  it  is  during  prototype  execution  that  the  effects  are  most 
harmful,  since  they  wiU  incorTectly  steal  CPU  time  from  the  host  systeta  It  is  also 
unavoidable  that  time  is  spent  by  the  host  operating  system  to  serve  processes  that 
sometimes  nothing  have  to  do  with  the  prototype. 

All  these  unwanted  interactions  can  dramatically  affea  timing  behavior  and  overall 
confidence  in  the  prototype.  The  question  to  ask,  then,  is  how  can  these  tinting 
interferences  be  eliminated? 

To  solve  these  problems,  CAPS  introduced  the  techitique  of  having  two  different 
time  lines.  One  is  the  absolute  time  line,  and  is  driven  the  real-time  clock  of  the  host 
machine.  The  second  one,  the  simulation  time,  wiU  command  all  the  scheduling  actions  of 
the  prototype. 


What  is  going  to  happen  is  that  whenever  an  external  operator,  or  some  operating 
system  function,  is  being  executed,  the  scheduling  clock  will  be  frozen,  so  that,  for  the 
prototype,  it  is  as  though  they  do  not  exist. 

Another  feature  that  can  be  explored  with  this  technique  is  when  an  operator 
belonging  to  the  prototype  exceeds  its  scheduling  interval  and  causes  an  exception.  It  is 
very  likely  that  this  will  interfere  with  other  operators,  causing  a  chain  of  exceptions,  when 
in  reality,  only  the  very  first  operator  incurred  a  timing  error.  Because  of  the  use  of  a 
simulated  clock  (the  scheduling  clock)  it  is  posable  to  remove  any  excess  of  time  from  the 
scheduling  clock,  and  then  resume  the  simulation,  so  that  no  further  operators  will  be 
affected. 
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VI.  EXPERIMENTAL  RESULTS 


A.  INTRODUCTION 

Although  the  full  implementation  of  the  new  Distributed  Model  is  not  con:q)lete, 
due  to  software  limitations  of  the  current  Ada  con^iiler  technology  that  will  be  solved  by 
the  new  Ada95  implementation,  much  can  be  said  about  expectations  and  also  about  the 
general  scheduling  capability  of  CAPS. 

One  of  the  biggest  problems  encountered  during  this  research  was  the  lack  of  an 
adequate  set  of  prototypes  to  test  the  schedule.  Up  to  now,  most  of  the  development  in 
CAPS  has  been  tested  with  a  few  prototypes  that  may  be  sufficient  for  the  development  of 
several  tools,  but  not  for  the  scheduler,  which  requires  a  huge  test  set  so  that  all  the 
critical  points  can  be  exercised.  This  is  the  reason  for  building  a  PSDL  random  graph 
generator,  as  discussed  in  the  next  section  of  this  chapter. 

B.  THE  RANDOM  GRAPH  GENERATOR 

The  random  graph  generator  has  the  following  basic  features: 

1)  builds  PSDL  prototypes  with  an  arbitrary  number  of  operators 

2)  allows  the  user  to  specify  how  many  different  prototypes  are  to  be  generated 

3)  provides  an  expert  mode  where  the  system  attempts  to  reduce  the  harmonic 
block  automatically,  by  changing  the  periods  of  the  periodic  and  the 
transformed  sporadic  operators  within  an  user-defined  range 

4)  operates  in  two  randomization  modes:  unlimited  or  restricted  randomization 

5)  provides  a  con^mssion  capability,  so  that  an  arbitrary  number  of  operators  may 
be  located  within  a  bounded  load  factor  of  one.  This  is  very  useful  for  testing 
uiuprocessor  scheduling  algorithms 

6)  allows  the  user  to  specify  the  desired  percentage  of  timing  critical  operators, 
periodic  operators,  and  data  flow  edges 

7)  can  generate  prototypes  with  different  degrees  of  sparseness 

8)  the  user  can  specify  the  maximum  number  of  edges  between  two  operators 
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9)  provides  a  thorough  scheduling  information  for  debugging  purposes 
There  are  basically  two  major  procedures  that  build  the  random  graph.  The  first 
one  is  the  Produce_Random_Array  and  the  second  one  is  the  Produce_Random_Matrix. 
Both  routines  use  the  same  data  structure  of  the  scheduler,  so  that  the  simulation  is  as 
close  as  possible  to  the  real  prototype. 
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Figure  6.1.  Partial  View  of  the  Data  Structure  Used  to  Build  the  Random  Graph. 


The  first  procedure,  Produce_Random_Array,  is  the  one  that  actually  randomly 
assigns  the  timing  constraints  to  the  random  prototype.  It  has  two  nxxies  of  operation. 
The  first  one  uses  a  partial  randomization,  in  the  sense  that  only  values  from  a  pre-defined 
set  are  assigned  to  the  timing  constraints.  The  second  mode  uses  a  full  randomization,  so 
that  any  value  within  a  finite  range  previously  specified  can  be  assigned. 

It  is  in  this  procedure  where  most  of  the  information  provided  by  the  user,  such  as 
number  of  prototypes  to  be  generated,  number  of  operators  in  each  prototype,  percentage 
of  tinting  critical  operators,  mode  of  randomization,  percentage  of  periodic  operators,  and 
compression  factor  are  used. 

In  the  current  in:q}lementation,  the  restricted  randomization  mode  generates  five 
possible  different  values  for  MET  (100, 300, 500, 700,  and  1000)  and  four  values  for  each 
of  the  remaining  tinting  constraints  PER.  FW,  MO^  and  MRT,  which  arc  dependent  upon 
the  previous  chosen  value  for  the  MET.  TTtis  was  done  in  order  to  assure  semantic 
compatibility  with  a  valid  PSDL  prototype. 
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If  one  opts  for  unlimited  randomization,  then  no  restriction  is  imposed  on  timing 
constraints,  rather  than  limiting  their  values  within  a  reasonable  range,  which  now  stands 
between  0  and  8(XX)  ms. 

The  random  number  generator  being  used  has  a  period  of  ^proximately  2*^,  so  in 
order  to  achieve  better  results  it  is  not  reset  after  the  generation  of  each  different 
prototype. 

The  expert  mode  is  a  fecility  that  allows  the  user  to  automatically  reduce  the  final 
harmonic  block  length  of  the  prototype,  substantially  increasing  the  schedulability  of  the 
prototype.  For  more  in  depth  information,  refer  to  Chapter  m.  Section  E. 

The  compression  factor  is  used  so  that,  if  the  prototype  happens  to  have  a  load 
factor  bigger  than  one  (which  would  mean  that  it  couldn’t  run  in  a  uniprocessor  system) 
then  the  timing  constraints  are  going  to  be  confessed  accordingly.  This  feature  allow  us 
to  test  huge  prototypes  for  uniprocessors  that  otherwise,  due  to  the  random  nature  of  the 
graph,  would  be  very  hard  to  achieve. 

The  second  main  procedure,  Produce_Random_Matrix,  is  where  artificial  edges 
are  randomly  generated  according  to  the  degree  of  sparseness  and  the  maximum  number 
of  edges  defined  by  the  user.  It  is  also  here  where  the  latency  for  each  edges  is  generated. 

C  FIRST  FINDINGS  AFTER  USING  THE  RANDOM  GRAPH  GENERATOR 

The  first  finding  after  using  the  random  graph  generator  was  that  the  scheduling 
capability  of  the  existing  CAPS  scheduler  is  very  poor.  It  is  not  likely  that  the  scheduler 
will  find  a  feasible  schedule  for  a  moderate  size  prototype  without  manual  adjustment  of 
all  timing  constraints  after  a  long  and  tedious  process  of  trial  and  error.  But  that  is  not 
really  bad  because,  after  all,  the  static  scheduling  problem  is  a  well  known  NP-Hard 
problem.  The  interesting  dting,  however,  is  that  even  for  very  small  prototypes,  with  as 
few  as  4  or  5  operators,  and  also  a  very  limited  number  of  edges,  it  sdU  couldn’t  find  a 
feasible  schedule,  even  through  the  use  of  traditional  and  widely  accepted  algorithms,  such 
as  earliest  start  time  first  and  earliest  deadline  first,  modified  for  the  non-preemptive 
case.  The  question  to  be  asked  is,  “Why  does  that  happen,  and  how  can  we  improve  it?’’. 
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After  meticulous  analysis  of  several  runs,  with  hundreds  of  random  prototypes,  it 
was  determined  that,  on  average,  the  earliest  deadline  first  algorithm  finds  a  feasible 
schedule  for  prototypes  with  load  factors  less  than  0.5.  It  was  also  noticeable  that  the 
schedulability  of  the  prototype  was  affected  somehow  by  the  harmonic  block  length  (HB). 
There  were  some  cases  where,  even  with  load  factors  over  0.95,  after  optimizing  the  HB 
to  smaller  values,  it  was  possible  to  find  a  feasible  schedule,  which  could  not  be  achieved 
with  the  bigger  HB.  The  load  factor  definitely  has  a  strong  influence  on  schedulability. 
For  the  harmonic  block,  however,  it  was  not  thought  that  the  influence  would  be  so  great 

There  are  two  readily  apparent  explanations  for  the  harmonic  block  syndrome. 
The  first  is  because  of  the  higher  number  of  instances  that  can  fit  in  a  bigger  HB,  the 
probability  of  having  two  or  more  tasks  fighting  for  the  same  time  slot  increases.  The 
second  explanation  is  partially  supported  by  Theorem  6  in  Chapter  m,  where  it  is  evident 
that,  by  increasing  the  period  of  an  operator,  which  might  happen  when  its  period  is 
optimized,  it  also  has  an  effect  of  increasing  the  probability  of  finding  a  feasible  schedule. 

The  following  problems  are  now  apparent:  First,  how  to  decrease  the  load  factor 
of  our  prototype,  and;  second,  how  to  decrease  its  associated  harmonic  block. 

The  total  load  factor  of  the  prototype  cannot  be  changed  much,  since  it  comes 
ftom  the  user’s  requirements.  Splitting  them  into  muldple  processors  will  not  do  much 
good  in  the  current  practice  for  non-preemptive  static  distributed  scheduling,  which 
requires  a  global  schedule  for  the  entire  prototype  in  order  to  satisfy  all  synchronization 
requirements. 

In  order  to  change  the  harmonic  block,  assuming  that  the  METs  cannot  be 
changed,  it  is  necessary  to  modify  the  periods,  but  recall  that  they  are  constrained  by  the 
user’s  requirements.  However,  if  we  take  a  close  look  at  these  problems  it  is  possible  to 
realize  that  they  are  quite  different 

Assume  that  the  requirements  allow  for  making  little  changes  in  the  periods,  which 
is  a  fair  assumption,  since  in  most  of  the  systems  it  does  not  really  matter  if  the  period  of 
some  task  is  ICXX)  ms  or  1010  ms.  So  the  effect  of  such  period  change  on  the  load  factor 
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is  clearly  veiy  small,  while  for  the  harmonic  block  it  may  represent  a  very  big  change,  since 
it  may  get  rid  of  some  prime  factor  that  was  driving  up  the  least  common  multiple  (LCM) 
of  the  periods.  Following  this  line  of  reasoning  a  novel  technique  to  decrease  the 
harmonic  block  was  discovered,  and  will  be  described  in  the  next  section. 

D.  MINIMIZING  THE  HARMONIC  BLOCK 

The  need  for  a  harmonic  block  comes  from  the  fact  that,  unlike  most  of  the 
problems  in  classical  scheduling,  this  periodic  task  set  contains  an  infinite  number  of 
instances.  Therefore,  in  order  to  calculate  a  static  schedule  for  the  task  set,  it  is  necessary 
to  find  a  time  interval  which  can  be  repeated  forever.  When  the  completion  time  of  the 
first  instances  are  restricted  to  be  less  than  or  equal  to  the  pmods,  it  is  common  for  the 
harmonic  block  to  be  the  least  common  multiple  (LCM)  of  the  periods  for  such  an 
interval.  However,  when  those  restrictions  to  the  deadlines  do  not  apply,  it  has  been 
proven  in  Chapter  HI  Section  C  that  it  is  sufficient  to  increase  the  time  interval  to  twice 
the  LCM.  At  any  rate,  the  point  to  be  made  is  that  in  both  cases  the  size  of  the  LCM  is 
critical  and,  for  the  reasons  explained  in  the  previous  section,  it  is  desirable  to  make  it  as 
small  as  possible. 

Formally,  the  least  common  multiple  of  two  natural  numbers  i  and  j  is  the  smallest 
natural  number  that  is  divisible  by  both  i  and  j.  It  is  also  known  ftom  Number  Theory  that 
every  positive  integer  can  be  written  uniquely  as  the  product  of  primes,  where  the  prime 
factors  are  powers  of  some  positive  integer. 

From  the  above  definitions,  it  can  clearly  be  seen  that  the  LCM  of  two  natural 
numbers  i  and  j  will  have  in  its  prime  factorization  all  of  the  prime  factors  of  the  original 
numbers  raised  to  the  maximum  exponent,  as  shown  in  the  following  example. 

Example: 

i=120  =2^x3x5 

j  =  1(X)  =  2^  X  5^ 

LCM(ij)  =2’x3x5^  =  600 
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This  same  approach  can  be  extrapolated  to  a  case  where  several  numbers  are 
present,  instead  of  only  two.  So  now  the  problem  is  decreasing  the  LCM  of  a  set  of 
periods. 

There  are  two  basic  approaches.  The  first  one  is  trying  to  decrease  the  factor  with 
the  biggest  prime,  and  die  other  is  decreasing  the  biggest  prime  factor.  Qearly,  the  second 
approach  is  more  eiqiedient,  but  sdll  leaves  the  following  problem.  Suppose  all  of  the 
periods  which  are  contributing  for  the  factors  in  the  LCM  are  identified,  and  have  been 
placed  into  a  critical  list,  with  some  kind  of  mapping  to  the  factors  they  are  affecting. 
Now,  assume  that  the  period  which  is  contributing  for  the  biggest  factor  is  changed.  With 
luck,  that  biggest  factor  may  be  eliminated.  However,  the  exponent  of  some  other  prime 
factor  fiom  that  same  period  may  be  increased,  now  becoming  the  critical  one  for  the 
LCM.  In  other  words,  it  is  necessary  to  re-evaluate  the  critical  list  and  the  corresponding 
mapping  after  each  iteration  of  the  optimization  process,  or  one  may  end  up  with  a  non- 
optimal  solution. 

After  this  brief  description  of  the  problem  statement,  it  is  possible  to  introduce  the 
algorithm  for  optimizing  the  LCM,  which  is  presented  in  Hgure  6.2. 
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Algorithm  Optunize_LCM 

For  every  period  calculate  its  prime  factors; 

Calculate  the  initial  LCM  for  the  periodic  task  set  and  its  prime  factors; 

Set  the  flag  LCM  is  decreasing  to  false; 

While  there  exists  a  prime  factor  of  the  LCM  not  yet  optimized  loop 

Insert  those  tasks  whose  periods  are  contributing  fcx’  the  LCM  factors  into  the 
Critical  List  in  decreasing  order  of  their  contribution.  In  other  words,  the  head  ctf 
the  Critical  List  will  be  the  task  with  the  biggest  contribution  to  the  LCM; 

While  the  Critical  List  is  non-empty  loop 

Pick  the  task  which  is  in  the  head  of  the  Critical  List; 

Remove  its  cmitribution  from  the  LCM; 

For  each  period  within  its  allowable  range  loop 
Calculate  the  new  LCM; 

If  LCM  is  decreasing  then  record  this  period  as  the  best  one  so  far, 
end  loop; 

If  LCM  is  decreasing  then 

Update  the  new  LCM  and  the  ta^  prime  factors 
end  if; 

Remove  this  task  from  the  Critical  List; 
end  loop; 

if  LCM  is  decreasing  then 

It  means  that  come  critical  task  in  the  Critical  List  had  its  period  changed 
and  consequently  reduced  the  LCM.  Now  is  the  subtle  part,  even  if  we  had 
some  period  in  the  Critical  list  that  couldn’t  have  its  biggest  factor 
changed,  so  that  the  LCM  could  decrease,  it  needs  to  be  reconsidered,  since 
the  order  in  which  the  Critical  List  was  scanned  matters!!  In  other  words, 
after  all  the  others  in  the  Critical  List  have  been  processed,  it  may  well  now 
be  possible  to  change  that  same  task  so  that  the  LCM  will  be  decieased.  So, 
we  need  to  calculate  the  new  LCM  and  start  all  over  again, 
else  if  LCM  is  not  decreasing 

Means  that  none  of  the  critical  tasks  in  the  Critical  List  were  able  to  get  rid 
of  their  biggest  factor,  and  so  there  is  nothing  else  to  do  other  than  skip  to 
the  second  biggest  factor,  and  so  forth, 
end  if; 

Set  LCM  decreasing  flag  back  to  false; 
end  loop; 

end  Algorithm  Optimize.LCM; 

Figure  6.2.  Algorithin  for  Optimizing  the  LCM 

Although  its  optimality  has  not  been  formally  proven,  it  is  believed  that  this 
algorithm  will  always  lead  to  near-optimal  results.  By  applying  this  algorithm  to  some 
random  task  sets  it  was  possible  to  tremendously  reduce  the  harmonic  block,  with  some 
positive  effects  in  schedulability.  It  should  also  be  noted,  by  the  examples  shown  below, 
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that  the  periods  are  of  critical  importance.  With  very  few  changes  in  the  periods  an 
enormous  decrease  in  the  LCM  can  be  achieved,  with  consequently  few  effects  on  the  load 
factor. 
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Figure  6,3.  Optimization  Results 
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E.  THE  NEW  DISTRIBUTED  SCHEDULING  ALGORITHM  -  SOME 
RESULTS 

After  running  several  hundreds  prototypes  with  typical  values  for  the  timing 
constraints  (such  as  MET.  MRT,  MCP  and  PER)  it  was  possible  to  make  several 
conclusions  in  addition  to  those  already  cited  in  the  previous  sections.  One  of  them,  and 
actually  the  main  driving  force  for  directing  us  to  distributed  scheduling,  was  the  palpable 
necessity  for  prototypes  with  load  factors  bigger  than  1 .0,  specifically  in  our  applications 
domain. 

Another  major  point  discovered  after  this  research  is  the  real  need  for  supporting 
and  advising  the  real-time  system  designer,  mainly  with  respect  to  the  values  for  the  timing 
constraints.  Remember  that  non-preemptive  static  scheduling  is  a  well  known  NP-Hard 
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problem,  so  that  unless  P=NP,  there  is  not  much  hope  of  finding  better  ways  to  solve  this 
problem.  That  is  why,  sometimes,  in  prototypes  with  only  two  nodes,  it  was  impossible  to 
find  a  feasible  schedule. 

So,  what  is  really  needed  is  to  find  better  ways  to  live  with  this  problem.  One  of 
the  ways  to  accomplish  this  is  by  providing  better  support  in  the  area  of  schedulability 
tests,  which  is  also  a  known  NP-Hard  problem.  That  is  why  several  theorems  were 
presented  in  Chapter  m,  which,  it  is  hoped,  will  help  in  finding  and  pin-pointing  some  of 
the  problems  in  the  user’s  design. 

It  is  possible  by  making  use  of  those  theorems  to  suggest  changes  in  the  timing 
constraints  of  a  set  of  tasks,  or  even  in  a  specific  task,  to  suggest  different  partitions  so 
that  some  taglcs  are  kept  together  due  to  the  similarities  of  their  timing  constraints,  etc. 

Now  the  scheduler  can  handle  prototypes  with  load  factors  bigger  than  one,  by 
applying  the  allocation  algorithm  described  in  Chapter  V .  The  user  can  either  ^ecify  the 
maximum  load  factor  allowed  per  processor,  or  the  number  of  processors.  It  is  also 
capable  of  generating  a  schedule,  if  one  can  be  found,  by  using  a  distributed  version  of  the 
Earliest  Deadline  Rrst  algorithm.  By  making  use  of  the  Fundamental  Synchronization 
Theorem  it  is  now  possible  to  divide  the  schedule  into  several  smaller  schedules,  so  that  its 
complexity  is  tremendously  decreased. 

The  robustness  of  the  new  scheduler  is  enhanced  due  to  the  large  testing  that  was 
posable  by  the  random  graph  generator.  Several  important  bugs  were  found  during 
these  cj^ieriments.  It  was  possible  to  analyze  and  compare  the  performance  of  the 
different  uniprocessor  scheduling  algorithms  currently  inqrlemented  in  CAPS.  The  output 
generated  hy  the  scheduler  is  now  mcne  comprehensive,  improving  the  debugging 
capability. 

An  expert  mode  is  provided  to  the  designer,  so  that  the  hamronic  block  will  be 
decreased  with  some  effects  on  the  load  factor.  A  possible  enhancement  for  the  expert 
mode  is  to  combine  it  with  the  actual  scheduling.  In  other  words,  instead  of  applying  the 
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optimization  algorithm  to  the  entire  task  set  in  only  one  step,  prior  to  the  scheduling,  an 
attempt  should  be  made  to  schedule  the  task  set  after  each  optimization  iteration. 

As  can  be  seen,  quite  a  lot  has  been  accomplished  towards  a  more  dependable  and 
reliable  scheduler,  but  much  more  needs  to  be  done  so  that  CAPS  can  become  a  true 
design  aid  to  real-time  system  designers. 
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vn.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  SUMMARY  OF  THE  DISSERTATION 

This  dissertation  can  be  roughly  divided  into  three  parts.  The  first  part  (Chapters  I 
through  ni)  presents  a  review  of  the  most  in^rtant  results  in  the  area  of  hard  real-time 
scheduling  and  introduces  several  theorems  to  improve  the  schedulability  analysis  of  task 
sets  containing  both  periodic  and  sporadic  tasks.  The  effects  of  precedence  relationships 
among  the  tasks  on  these  theorems  is  also  analyzed.  Although  most  of  the  work  was  done 
for  the  non-preemptive  nKxiel,  several  results  are  also  ^plicable  to  the  preemptive  case, 
as  highlighted  throughout  the  dissertation.  The  second  part  of  the  dissertation  (Chapter 
IV)  introduces  the  novel  method  of  hard  real-time  distributed  scheduling  without  explicit 
synchronization.  The  motivation  for  this  new  approach  is  the  complexity  of  the  hard  real¬ 
time  scheduling  problem,  where  for  even  small  size  systems  running  in  a  uni-processor 
environment,  it  is  extremely  hard  to  find  a  feasible  schedule.  )^fith  the  addition  of  one 
more  variable,  such  as  distributed  processing,  the  general  scheduling  problem  becomes 
intractable,  and  unless  P=NP,  there  is  no  reason  to  foresee  any  solution  to  this  problem.  It 
was  therefore  decided  to  sacrifice  timing  constraints  in  order  to  decrease  the  complexity  of 
the  scheduling  problem.  Depending  on  the  application,  this  approach  may  not  be 
^plicable.  However,  this  approach  should  work  in  most  cases,  especially  in  prototyping, 
which  is  usually  in  the  early  stages  of  the  life  cycle  of  the  system,  allowing  for  the  fine 
tutting  of  timing  requirements.  The  third  part  of  the  dissertation  deals  with  the 
architectural  aspects  of  implementation  of  a  distributed  real-time  scheduler  without 
making  use  of  any  explicit  synchrottization.  The  following  paragraphs  present  a  summary 
of  the  salient  results  found  in  each  chapter. 

Chapter  I  highlights  the  increasing  demand  for  real-time  systems  in  life-critical 
areas  that  were  heretofore  unexplored.  Some  basic  definitions  for  hard  real-time  systems 
are  also  introduced,  and  a  taxonomy  for  scheduling  is  proposed.  Past  research  in  real-time 
scheduling  is  reviewed  and  the  major  results  are  listed  in  tabular  form.  A  brief  note  shows 
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that  the  complexity  of  scheduling  algOTithms  for  a  non-periodic  task  set,  which  are  solved 
in  polynomial  time,  become  eiqionential  when  dealing  with  periodic  task  sets.  Some 
complexity  results  for  message  routing  in  hard  real-time  distributed  systems  are  also 
presented. 

Chapter  n  presents  a  brief  discussion  of  the  Computer  Aided  Prototyping  System 
(CAPS)  which  is  a  software  engineering  tool  for  developing  prototypes  of  real-time 
systems.  Hie  Prototyping  System  Description  Language  (PSDL)  and  its  facilities  for 
modeling  real-time  systems  are  also  described  in  this  chapter. 

Chapter  HI  formalizes  the  real-time  scheduling  problem  for  periodic  and  sporadic 
task  sets.  It  starts  by  introducing  the  scheduling  model  that  will  be  used  throughout  the 
dissertation,  and  proceeds  with  the  presentation  of  several  theorems  for  improving  the 
schedulability  analysis  of  tasks  with  hard  deadlines.  The  three  most  important  results  in 
this  chapter  are  established  by  Theorems  6,  7,  and  8.  The  Task  Demand  Theorem 
(Theorem  6),  specities  necessary  conditions  for  task  sets  with  arbitrary  deadline  and 
release  times  to  be  schedulable.  It  is  also  shown  that  if  release  times  are  taken  into 
consideration,  due  to  precedence  relations,  for  example,  the  conditions  are  no  longer 
necessary,  but  only  sufficient  Theorem  7  extends  this  result  and  proves  that  any  penodic 
or  sporadic  task  set  satisfying  the  conditions  of  Theorem  6  can  be  scheduled  with  the 
Earliest  Deadline  First  (EDF)  algorithm,  thus  making  the  conditions  specified  in  Theorem 
6  necessary  and  sufficient  The  Harmonic  Block  Theorem  (Theorem  8)  introduces  the 
novel  concept  of  transient  and  cyclic  schedules,  which  is  an  enhancement  of  the  traditional 
method  for  calculating  a  cyclic  schedule,  if  one  exists.  It  is  shown  by  example  that  this 
latter  method  improves  the  schedulability  of  task  sets  which  were  found  to  have  no 
feasible  schedule  by  the  traditional  method.  Later  in  the  chapter  all  previous  results  are  re¬ 
analyzed  for  the  case  where  precedence  relationships  exist  among  the  tasks.  Theorem  8  is 
also  extended  to  handle  the  simation  where  latencies  are  involved  in  the  scheduling.  Note 
that  the  net  effect  of  introducing  latencies  in  the  problem  is  that  the  schedule  can  no  longer 
be  assumed  to  have  no  inserted  idling  time  in  the  interval  [Od-CM].  Finally,  a 
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methodology  to  convert  spKjradic  operators  into  equivalent  periodic  ones  is  presented, 
along  with  some  important  considerations  about  this  conversion. 

Chapter  IV  presents  an  in-depth  discussion  covering  all  possible  aspects  of  the 
communication  involving  two  PSDL  operators  connected  by  some  kind  of  data  stream. 
The  synchronization  problem  between  producers  and  consumers  is  carefully  analyzed,  as  is 
the  underlying  meaning  of  misang  a  deadline  within  the  context  of  a  real-time  system. 
The  conclusion  reached  is  that  missing  deadlines  are  always  attached  to  data  that  is  not 
generated  or  consumed  in  the  proper  timing.  This  data  approach  for  the  synchronization 
problem  will  lead  to  the  new  distributed  scheduling  model  with  no  explicit 
synchronization,  which  is  formalized  the  Fundamental  Synchronization  Theorem 
(Theorem  9).  The  application  of  this  theorem  allows  each  set  of  tasks  allocated  to  a 
particular  processor  to  be  treated  as  a  totally  independent  set,  provided  that  some  more 
stringent  timing  constraints  are  satisHed.  This  approach  will  greatly  decrease  the 
scheduling  complexion  of  large  distributed  real-time  systems,  although  it  may  be  applicable 
as  well  to  cases  involving  uni-processors  or  shared  memoiy  multiprocessors.  At  the  end 
of  this  chapter  are  some  considerations  about  the  allocation  model  implemented  for  the 
distributed  scheduler  in  CAPS. 

Chapter  V  presents  the  current  implementation  of  the  CAPS  uni-processor 
scheduler  and  it  also  proposes  an  architecture  for  implementing  the  full  version  of  the 
distributed  scheduler.  It  describes  two  options  for  implementing  the  distributed  version. 
The  first  is  to  use  the  currently  available  C  libraries  for  implementing  the  communication 
sub-system.  Several  problems  with  this  approach  are  also  addressed.  The  second  option 
relies  on  the  availability  of  a  full  Ada95  compiler,  which,  according  to  the  Ada95 
Reference  Manual’s  Annex  E,  will  support  commuiucations  between  tasks  ruiming  in 
different  processors.  In  the  last  section  of  this  chapter  several  interesting  considerations 
are  presented  regarding  the  timing  problems  involved  in  a  typical  software  prototyping 
environment  Topics  such  as  simulated  time,  normalized  reference  for  time  infonnation, 
timing  errors,  and  why  they  happen  are  covered  in  this  section. 
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Chapter  VI  presents  experimental  results  of  the  partially  implemented  distributed 
scheduler  in  CAPS.  The  random  PSDL  graph  generator,  which  was  one  of  the  important 
factors  for  a  better  understanding  of  the  scheduling  problems  in  CAPS,  is  described. 
Finally,  an  important  issue  is  discussed  which  is  not  given  enough  attention  by  most  of 
researchers,  namely,  the  least  common  multiple  (LCM)  of  the  periods  of  a  periodic  task 
set,  which  ultimately  will  determine  the  size  of  the  cyclic  schedule  for  the  task  set.  It  is 
demonstrated  that,  by  making  minor  changes  in  the  original  penods,  the  final  LCM  and, 
consequently,  the  solution  space  of  the  corresponding  scheduling  problem  can  be 
drastically  reduced. 

Chapter  Vn  is  the  conclusion,  but  it  also  proposes  some  modifications  for  CAPS, 
so  that  it  can  become  a  more  dependable  and  reliable  design  tool  for  building  real-time 
systems. 

B.  POSSIBLE  CAPS  MODIFICATIONS 

As  a  result  of  this  dissertation,  several  weaknesses  and  areas  requiring 
improvement  within  the  entire  CAPS  and  PSDL  were  identified.  Many  errors  in  the  static 
scheduler  were  corrected,  but  others  require  further  effort. 

1.  Enhancing  the  CAPS  Syntax  Directed  Editor  (SDE) 

As  discussed  in  Chapter  FV,  several  semantic  checks  for  the  input  PSDL  program 
are  currently  enforced  by  the  scheduler.  It  seems  reasonable,  however,  to  allow  most  of 
these  checks  to  be  enforced  by  the  SDE.  This  approach  would  allow  the  user  to  detect 
and  receive  warnings  about  the  design  in  the  early  stages  of  prototyping.  In  doing  so,  the 
designer  would  not  have  to  go  all  way  back  to  the  SDE  when  a  semantic  error  was  found 
by  the  scheduler. 

2.  Tasks  with  Soft  Deadlines 

In  CAPS  there  arc  only  tasks  with  hard  deadlines  (TC),  or  tasks  with  no  deadlines 
at  all  (NTC).  In  real-time  systems  however,  there  are  often  a  third  kind  of  deadline,  but  if 
it  is  missed  for  some  reason  it  does  not  cause  any  harm  to  the  system.  This  is  known  as  a 
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“soft  deadline”.  Right  now  for  example,  an  NTC  operator  can  starve  for  a  long  time 
before  its  execution.  This  was  certainly  not  the  intention  of  the  designer  when  the 
operator  was  placed  in  the  prototype.  This  anomaly  happens  because  the  Non-Time 
Critical  operator  (NTC)  depends  on  the  time  left  the  static  scheduler,  which  can  be 
none  if  the  load  factor  is  1.0,  and  all  the  TC  operators  use  their  entire  MET. 

The  implementation  of  tasks  with  soft  deadlines  or  some  other  approach,  like  the 
time-value  functions  presented  in  Chapter  I,  would  greatly  improve  the  scheduling 
capability  of  prototypes  in  CAPS. 

3.  Preemptive  Static  Scheduling 

So  far  this  option  has  not  been  used  in  CAPS  because  of  the  ADA83  tasking 
naodel,  which  prevents  tasks  with  higher  priority  to  change  their  relative  position  in  the 
FIFO  queue  of  a  rendezvous.  ADA95  however,  allows  dynamic  changes  in  the  queue 
according  to  their  priority  and,  therefore,  the  preemptive  model  again  becomes  a  valid  and 
reasonable  option  for  the  CAPS  scheduler.  Note  that,  in  general,  the  preemptive 
scheduling  problem  is  easier  to  deal  with  than  the  non-preemptive  one,  allowing  much 
better  scheduling  results.  Further  research  is  needed,  but  it  appears  that  allowing  a 
mixture  of  preemptive  and  non-preemptive  tasks  is  the  best  approach  available. 

4.  Triggering  Conditions  versus  Stream  Types 

Currently,  in  the  PSDL  model  a  saiiq)led  stream  does  not  guarantee  that  the  data  is 
not  lost  or  replicated.  In  the  same  noodel,  however,  the  stream  type  is  determined  fiom 
the  triggering  condition  of  the  consumer  operator,  c.g.,  an  operator  with  a  TRIGGERED 
BY  SOME  condition  is  supposed  to  guarantee  that  its  output  is  based  on  the  most  recent 
value  of  the  input  sampled  stream,  which  is  to  some  extent  a  contradiction.  Our 
suggesticHi  is  to  separate  triggering  ctHiditions  fiom  the  type  of  the  streams,  so  that  there 
can  be  a  more  orthogonal  grammar  for  PSDL.  A  sampled  stream  should  be  defined  as  the 
stream  where  the  data  can  be  read  zero  or  more  times,  whether  in  a  data  flow  stream  it  can 
be  read  once  and  only  once.  It  is  understood  that  this  definition  better  conveys  the  real 
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meaning  of  a  stream,  since  a  stream  by  itself  should  not  guarantee  whether  or  not  the  data 
is  lost;  the  stream  is  simply  a  mechanism  to  transfer  data. 

Once  the  idea  of  separating  triggering  conditions  from  stream  types  is  accepted,  it 
is  necessary  to  check  which  are  the  valid  combinations.  These  combinations  are  presented 
in  Table  7.1,  and  should  be  considered  valid  unless  an  exception  is  noted 


TBIGGEREXi 

hVSOMS 

NO^GGER 

OK 

'  SAMPLED  SntEA^ 

OK 

OK  1 

Table  7.1.  Triggering  Cmidition  and  Stream  Type  Combinations 


(1)  Assume  an  operator  A  TRIGGERED  BY  ALL  X,Y,  where  X  and  Y  are 
sampled  streams.  Suppose  data  arrived  only  in  X.  It  is  necessary  to  wait  for  new  data  in 
Y,  but  after  A  is  fired  both  pieces  of  data  are  consumed,  and  the  old  data  caimot  be  used 
again,  otherwise  it  is  impossible  to  know  which  data  is  new  or  old,  and  therefore  the 
existence  of  this  case  does  not  make  sense.  The  only  situation  where  this  combination 
would  be  needed  is  if  combinations  of  TRIGGERED  BY  SOME  and  TRIGGERED  BY 
ALL  are  allowed  to  exist  for  the  same  operator.  Note,  however,  that  this  combination  can 
always  be  implemented  in  two  steps  and  with  one  additional  operator. 

(2)  Assume  an  operator  A  TRIGGERED  BY  SOME  X,Y,  where  X  and  Y  are 
data  flow  streams.  Suppose  only  X  gets  new  data.  Operator  A  will  fire  and  consume  the 
data  in  X,  leaving  nothing  behind  because  it  is  data  flow.  When  new  data  comes  in  Y, 
there  is  nothing  in  X,  and  an  underflow  will  occur. 

(3)  It  docs  not  make  sense,  because  if  there  is  no  trigger,  how  can  the  consumer  be 
guaranteed  to  always  catch  new  data  that  comes  into  the  data  flow? 

5.  Estimating  the  Execution  Time 

As  explained  earlier,  the  MET  is  an  upper-bound  on  the  execution  time  of  an 
operator,  and  it  is  this  value  which  is  used  by  the  scheduler  to  generate  the  static  schedule. 
Therefore,  everything  that  can  be  done  to  decrease  the  MET  is  going  to  have  a  direct 
effect  on  the  schedulability  of  the  prototype.  It  would  be  nice  if  it  were  possible  to,  at  run- 
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time,  keep  track  of  the  real  amount  of  time  needed  by  each  operator,  so  that  feedback 
could  be  given  to  the  user  about  its  real  MET  for  further  update  of  the  Software  Base. 

6.  The  Uninitialized  Sampled  Stream  Problem 

Suppose  there  is  a  non-time  critical  operator  (NTC)  connected  to  a  time  critical 
operator  (TC)  by  a  sampled  stream.  Qearly,  the  TC  operator  may  be  fired  at  least  once 
before  the  NTC  operator,  and  therefOTe  it  will  read  garbage  from  the  sampled  stream. 

This  problem  is  aggravated  in  distributed  scheduling,  as  shown  by  the  example  in 
Figure  7.1. 


Hgure  7.1.  The  Unirutialized  Sanq)led  Stream  Problem 

Note  that  this  exan^le  does  not  cause  any  problem  in  the  uru-processor  case,  but 
in  distributed  scheduling,  if  OPj  and  OP2  are  assigned  to  different  processors,  OP2  may 
fire  before  0P|,  and  an  unirutialized  sampled  stream  will  be  read.  A  proposed  solution 
would  be  to  force  the  sampled  stream  to  be  declared  as  a  state  stream  whenever  an  initial 
value  is  needed. 

7.  State  Stream  versus  Data  Flow 

It  does  not  make  sense  to  have  an  operator  TRIGGERED  BY  ALL  X,  if  X  is,  for 
exan^le,  a  state  stream.  The  reason  for  this  is  that  values  carried  by  state  streams  should 
always  be  available,  and  in  a  data  flow  stream  the  value  is  consumed  after  it  is  read,  and  no 
longer  available.  A  warning  should  therefore  be  given  if  this  happens  in  a  PSDL  program. 
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C.  CONCLUSIONS 


This  dissertEtion  shows  that  hard  real-time  systems  and,  more  specifically,  hard 
real-time  scheduling,  are  areas  which  are  far  from  being  totally  explored.  The  next 
generation  of  hard  real-time  systems  will  be  extremely  large,  complex,  and  most  certainly 
distributed.  They  will  be  truly  distributed,  without  any  need  for  synchronization  among 
processors. 

Most  of  the  work  so  far  in  this  area  has  been  concentrated  on  finding  better 
scheduling  algorithms,  without  concentrating  on  the  real  need  for  synchronization. 
Deadlines  arc  always  attached  to  data  not  being  generated  or  consumed  in  a  timely 
fashion.  This  dissertation  is  the  first  work  ever  done  in  the  area  of  distributed  scheduling 
without  any  explicit  synchronization,  and  it  is  hoped  that  it  will  mark  a  turning  point  in  the 
distributed  scheduling  field.  It  is  far  fiom  being  coir^ilete,  but  it  does  provide  a  totally 
different  perspective  on  the  distributed  scheduling  problem. 

Finally,  this  dissertation  offers  the  following  scientific  contributions: 

1)  A  new  model  for  distributed  scheduling  without  synchronization; 

2)  Several  theorems  on  the  schedulability  of  periodic  and  sporadic  task  sets, 
improving  the  state  of  the  art  in  the  scheduling  field; 

3)  A  general  Timing  Model  for  Pototyping  Systems,  which  will  enable  interaction 
with  different  time  references,  keeping  total  consistency  throughout  the  design; 

4)  A  method  for  optimizing  the  schedule  length  of  periodic  task  sets.  This 
tqiproach  will  decrease  the  time  spent  in  scheduling  and  improve  the  chances  of 
finding  a  feasible  schedule; 

5)  Making  use  of  recent  theoretical  results  in  scheduling,  they  have  been  adapted 
to  the  model  in  this  woric  in  order  to  support  a  systematic  and  formal  method 
for  the  design,  synthesis,  and  validation  of  timing  constraints  in  hard  real-time 
systems. 


150 


More  qjecifically  related  to  CAPS,  the  following  contributions  can  be  listed  as 
additional  results  of  this  dissertation; 

1)  Enhancement  of  the  existing  CAPS  Prototyping  System  with  a  new  Distributed 
Scheduler  with; 

•  allocation  cq)ability 

•  increased  reliabiliQr 

•  better  schedulability 

•  and  an  expat  mode 

2)  A  Random  PSDL  Graph  Genoator. 
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